sg16-unicode / sg16

SG16 overview and general information
46 stars 5 forks source link

WG21 SG16 Unicode study group

SG16 is an ISO/IEC JTC1/SC22/WG21 C++ study group tasked with improving Unicode and text processing support within the C++ standard.

If you would like to contribute to the discussion, please subcribe to our mailing list at https://lists.isocpp.org/mailman/listinfo.cgi/sg16.

Meetings are generally held twice a month; invitations are sent to the mailing list. Summaries of past meetings are available at https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md.

A standing paper that describes our intended scope, directives, guidelines and constraints is available at P1238 - SG16: Unicode Direction. Anyone wanting to follow or contribute to SG16 should become familiar with it.

We also provide input on other proposals within WG21 and WG14 when those proposals touch on topics listed in P1253 - Guidelines for when a WG21 proposal should be reviewed by SG16.

The following sections list projects, Unicode papers, and ISO papers that fall under the purview of SG16.

Active Projects

Project Description/Links
Boost.Text What a c++ standard Unicode library might look like
Code repository
Documentation
ztd.text The premiere library for handling text in different encoding forms and reducing transcoding bugs in your C++ software
Code repository
Documentation
text_view A C++ Concepts based character encoding and code point enumeration library
Code repository

Unicode papers

Document Number Title/Notes/Links
L2/23-153 Opposition to and Comment on L2/23–107
L2/23-107 Proper Complex Script Support in Text Terminals
L2/21-038 Clarify guidance for use of a BOM as a UTF-8 encoding signature

ISO/IEC JTC1/SC2/WG2 (Unicode) Papers

Active Papers

WG21 Number Title/Notes/Links
WG2-N5168 Name aliases and UTF-16 encoding scheme are inconsistent with the Unicode Standard
Per WG2-N5175, WG2-N5174 contains the proposed resolution.
WG2-N5174 Proposed changes concerning Character Name Aliases in ISO/IEC 10646
This is the proposed resolution for WG2-N5168.

ISO/IEC JTC1/SC22/WG21 (C++) Papers

Active Papers

WG21 Number Title/Notes/Links
P3374 Adding formatter for fpos
P3364 Remove Deprecated u8path overloads From C++26
P3263 Encoding annotated char
P3258 Formatting of charN_t
P3154 Deprecating signed character types in iostreams
P3070 Formatting enums
P2873 Remove Deprecated Locale Category Facets For Unicode from C++26
P2758 Emitting messages at compile time
P2749 Down with ”character”
P2729 Unicode in the Library, Part 2: Normalization
P2728 Unicode in the Library, Part 1: UTF Transcoding
P2626 charN_t incremental adoption: Casting pointers of UTF character types
P2528 C++ Identifier Security using Unicode Standard Annex 39
P2348 Whitespaces Wording Revamp
P2319 Prevent path presentation problems
P1953 Unicode Identifiers And Reflection
P1729 Text Parsing
P1629 Standard Text Encoding
P1628 Unicode character properties
P1030 std::filesystem::path_view
P0244 Text_view: A C++ concepts and range based character encoding and code point enumeration library

Accepted C++26 Papers

WG21 Number Title/Notes/Links
P2909 Fix formatting of code units as integers
(Dude, where’s my char?)
P2872 Remove wstring_convert From C++26
P2871 Remove Deprecated Unicode Conversion Facets From C++26
P2845 Formatting of std::filesystem::path
P2741 user-generated static_assert messages
P2558 Add @, $, and ` to the basic character set
P2361 Unevaluated strings literals
P1885 Naming Text Encodings to Demystify Them
P1854 Conversion to execution encoding should not lead to loss of meaning

Accepted C++23 Papers

WG21 Number Title/Notes/Links
P2736 Referencing the Unicode Standard
P2713 Escaping improvements in std::format
P2693 Formatting thread::id and stacktrace
P2675 LWG3780: The Paper (format's width estimation is too approximate and not forward compatible)
P2653 Update Annex E based on Unicode 15.0 UAX 31
P2572 std::format() fill character allowances
P2513 char8_t Compatibility and Portability Fixes
P2460 Relax requirements on wchar_t to match existing practices
P2419 Clarify handling of encodings in localized formatting of chrono types
P2372 Fixing locale handling in chrono formatters
P2362 Remove non-encodable wide character literals and multicharacter wide character literals
P2316 Consistent character literal encoding
P2314 Character sets and encodings
P2295 Support for UTF-8 as a portable source file encoding
P2290 Delimited escapes sequences
P2246 Character encoding of diagnostic text
P2223 Trimming whitespaces before line splicing
P2201 Mixed string literal concatenation
P2093 Formatted output
P2071 Named universal character escapes
P2029 Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals
P1949 C++ Identifier Syntax using Unicode Standard Annex 31
P1072 basic_string::resize_and_overwrite

Accepted C++20 Papers

WG21 Number Title/Notes/Links
P1892 Extended locale-specific presentation specifiers for std::format
P1868 🦄 width: clarifying units of width and precision in std::format
P1423 char8_t backward compatibility remediation
P1139 Address wording issues related to ISO 10646
P1041 Make char16_t/char32_t string literals be UTF-16/32
P1025 Update The Reference To The Unicode Standard
P0645 Text Formatting
P0482 char8_t: A type for UTF-8 characters and strings

Inactive Papers

Inactive papers list The following papers are no longer being pursued. WG21 Number | Title/Notes/Links ------------- | ----- ~~[P2773][]~~ | ~~Considerations for Unicode algorithms~~
(This is an informational paper and was reviewed by SG16 in February and March of 2023) ~~[P2498][]~~ | ~~Forward compatibility of text\_encoding with additional encoding registries~~
(Dropped by the author following lack of consensus for a change in LEWG) ~~[P2491][]~~ | ~~Text encodings follow-up~~
(The concerns raised in this paper were avoided by changes made in R10 of [P1885][]) ~~[P2297][]~~ | ~~Wording improvements for encodings and character sets~~
(The goals of this paper were mostly addressed via [P2314][]) ~~[P2194][]~~ | ~~The character set of C++ source code is Unicode~~
(The goals of this paper are now being pursued via [P2314][] and [P2297][]) ~~[P2178][]~~ | ~~Misc lexing and string handling improvements~~
(The goals of this paper are now being pursued via [P1854][], [P2223][], [P2295][], [P2297][], [P2348][], [P2316][], [P2361][], [P2362][], and [P2460][]) ~~[P2020][]~~ | ~~Locales, Encodings and Unicode~~
(This paper did not contain a concrete proposal and no revisions are expected; it will be used as reference material) ~~[P1880][]~~ | ~~uNstring Arguments Shall Be UTF-N Encoded~~
(This proposal was withdrawn by the author upon determining that the complexity of the required wording updates would outweigh their benefits) ~~[P1879][]~~ | ~~Please Don't Rewrite My String Literals~~
(This proposal was withdrawn by the author) ~~[P1859][]~~ | ~~Standard terminology for execution character set encodings~~
(The goals of this proposal were accomplished via [P2314][]) ~~[P1844][]~~ | ~~Enhancement of regex~~
(Severe ABI concerns prevent updating `std::regex`. We will explore deprecating and replacing it) ~~[P1097][]~~ | ~~Named character escapes~~
(Superseded by P2071) ~~[P0353][]~~ | ~~Unicode Friendly Encoding Conversions for the Standard Library~~
(This proposal is not being advocated at this time; more foundational concerns need to be addressed first) ~~[P0169][]~~ | ~~regex with Unicode character types~~
(This proposal is not being advocated at this time; more foundational concerns need to be addressed first)

ISO/IEC JTC1/SC22/WG14 (C) Papers

Active Papers

WG14 Number Title/Notes/Links
N3366 Restartable Functions for Efficient Character Conversions, r13
(Previously N2431 (R0), N2440 (R1), N2500 (R2), N2595 (R3), N2620 (R4), N2730 (R5), N2902 (R6), N2966 (R7), N2999 (R8), N3031 (R9), N3075 (R10), N3095 (R11), N3265 (R12))
N3145 $ in Identifiers v2
(Previously N3046 (R0))
N3124 Aligning Universal Character Names Constraints with C++
N3095
N3016 Unicode Length Modifiers v3
N2948 Accessing the command line arguments outside of main()
N2932 C Identifier Security using Unicode Standard Annex 39 v2
(Previously N2916 (R0))
N2785 Delimited escapes sequences

Accepted C23 Papers

WG14 Number Title/Notes/Links
N2940 Removing trigraphs??!
N2939 Identifier Syntax Fixes
N2836 C Identifier Syntax using Unicode Standard Annex 31
(Previously N2777 (R0))
N2828 Unicode Sequences More Than 21 Bits are a Constraint Violation
N2728 char16_t & char32_t string literals shall be UTF-16 & UTF-32 | r0
N2701 @ and $ in source and execution character set
N2653 char8_t: A type for UTF-8 characters and strings (Revision 1)
(Previously N2231 (R0))
N2594 Mixed Wide String Literal Concatenation
N2563 Character encoding of diagnostic text
N2418 Adding the u8 character prefix
(Previously N2198 (R0))

Inactive Papers

Inactive papers list WG14 Number | Title/Notes/Links ------------- | ----- ~~[N3265][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r12~~
(Superseded by [N3366][]) ~~[N3095][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r11~~
(Superseded by [N3265][]) ~~[N3075][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r10~~
(Superseded by [N3095][]) ~~[N3046][]~~ | ~~$ in Identifiers~~
(Superseded by [N3145][]) ~~[N3031][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r9~~
(Superseded by [N3075][]) ~~[N2999][]~~ | ~~Restartable for Efficient Character Conversions \| r8~~
(Superseded by [N3031][]) ~~[N2983][]~~ | ~~Unicode Length Modifiers v2~~
(Superseded by [N3016][]) ~~[N2966][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r7~~
(Superseded by [N2999][]) ~~[N2916][]~~ | ~~C Identifier Security using Unicode Standard Annex 39~~
Superseded by [N2932][]) ~~[N2902][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r6~~
(Superseded by [N2966][]) ~~[N2875][]~~ | ~~Unicode Length Modifiers~~
(Superseded by [N2983][]) ~~[N2777][]~~ | ~~C Identifier Syntax using Unicode Standard Annex 31~~
(Superseded by [N2836][]) ~~[N2730][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r5~~
(Superseded by [N2902][]) ~~[N2620][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r4~~
(Superseded by [N2730][]) ~~[N2595][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r4~~
(Superseded by [N2500][]) ~~[N2500][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r2~~
(Superseded by [N2595][]) ~~[N2440][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r1~~
(Superseded by [N2500][]) ~~[N2431][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions~~
(Superseded by [N2440][]) ~~[N2231][]~~ | ~~char8\_t: A type for UTF-8 characters and strings~~
(Superseded by [N2653][]) ~~[N2198][]~~ | ~~Adding the u8 character prefix~~
(Superseded by [N2418][])