SG16 is an ISO/IEC JTC1/SC22/WG21 C++ study group tasked with improving Unicode and text processing support within the C++ standard.
If you would like to contribute to the discussion, please subcribe to our mailing list at https://lists.isocpp.org/mailman/listinfo.cgi/sg16.
Meetings are generally held twice a month; invitations are sent to the mailing list. Summaries of past meetings are available at https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md.
A standing paper that describes our intended scope, directives, guidelines and constraints is available at P1238 - SG16: Unicode Direction. Anyone wanting to follow or contribute to SG16 should become familiar with it.
We also provide input on other proposals within WG21 and WG14 when those proposals touch on topics listed in P1253 - Guidelines for when a WG21 proposal should be reviewed by SG16.
The following sections list projects, Unicode papers, and ISO papers that fall under the purview of SG16.
Project | Description/Links |
---|---|
Boost.Text | What a c++ standard Unicode library might look like Code repository Documentation |
ztd.text | The premiere library for handling text in different encoding forms and reducing transcoding bugs in your C++ software Code repository Documentation |
text_view | A C++ Concepts based character encoding and code point enumeration library Code repository |
Document Number | Title/Notes/Links |
---|---|
L2/23-153 | Opposition to and Comment on L2/23–107 |
L2/23-107 | Proper Complex Script Support in Text Terminals |
L2/21-038 | Clarify guidance for use of a BOM as a UTF-8 encoding signature |
WG21 Number | Title/Notes/Links |
---|---|
WG2-N5168 | Name aliases and UTF-16 encoding scheme are inconsistent with the Unicode Standard Per WG2-N5175, WG2-N5174 contains the proposed resolution. |
WG2-N5174 | Proposed changes concerning Character Name Aliases in ISO/IEC 10646 This is the proposed resolution for WG2-N5168. |
WG21 Number | Title/Notes/Links |
---|---|
P3374 | Adding formatter for fpos |
P3364 | Remove Deprecated u8path overloads From C++26 |
P3263 | Encoding annotated char |
P3258 | Formatting of charN_t |
P3154 | Deprecating signed character types in iostreams |
P3070 | Formatting enums |
P2873 | Remove Deprecated Locale Category Facets For Unicode from C++26 |
P2758 | Emitting messages at compile time |
P2749 | Down with ”character” |
P2729 | Unicode in the Library, Part 2: Normalization |
P2728 | Unicode in the Library, Part 1: UTF Transcoding |
P2626 | charN_t incremental adoption: Casting pointers of UTF character types |
P2528 | C++ Identifier Security using Unicode Standard Annex 39 |
P2348 | Whitespaces Wording Revamp |
P2319 | Prevent path presentation problems |
P1953 | Unicode Identifiers And Reflection |
P1729 | Text Parsing |
P1629 | Standard Text Encoding |
P1628 | Unicode character properties |
P1030 | std::filesystem::path_view |
P0244 | Text_view: A C++ concepts and range based character encoding and code point enumeration library |
WG21 Number | Title/Notes/Links |
---|---|
P2909 | Fix formatting of code units as integers (Dude, where’s my char?) |
P2872 | Remove wstring_convert From C++26 |
P2871 | Remove Deprecated Unicode Conversion Facets From C++26 |
P2845 | Formatting of std::filesystem::path |
P2741 | user-generated static_assert messages |
P2558 | Add @, $, and ` to the basic character set |
P2361 | Unevaluated strings literals |
P1885 | Naming Text Encodings to Demystify Them |
P1854 | Conversion to execution encoding should not lead to loss of meaning |
WG21 Number | Title/Notes/Links |
---|---|
P2736 | Referencing the Unicode Standard |
P2713 | Escaping improvements in std::format |
P2693 | Formatting thread::id and stacktrace |
P2675 | LWG3780: The Paper (format's width estimation is too approximate and not forward compatible) |
P2653 | Update Annex E based on Unicode 15.0 UAX 31 |
P2572 | std::format() fill character allowances |
P2513 | char8_t Compatibility and Portability Fixes |
P2460 | Relax requirements on wchar_t to match existing practices |
P2419 | Clarify handling of encodings in localized formatting of chrono types |
P2372 | Fixing locale handling in chrono formatters |
P2362 | Remove non-encodable wide character literals and multicharacter wide character literals |
P2316 | Consistent character literal encoding |
P2314 | Character sets and encodings |
P2295 | Support for UTF-8 as a portable source file encoding |
P2290 | Delimited escapes sequences |
P2246 | Character encoding of diagnostic text |
P2223 | Trimming whitespaces before line splicing |
P2201 | Mixed string literal concatenation |
P2093 | Formatted output |
P2071 | Named universal character escapes |
P2029 | Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals |
P1949 | C++ Identifier Syntax using Unicode Standard Annex 31 |
P1072 | basic_string::resize_and_overwrite |
WG21 Number | Title/Notes/Links |
---|---|
P1892 | Extended locale-specific presentation specifiers for std::format |
P1868 | 🦄 width: clarifying units of width and precision in std::format |
P1423 | char8_t backward compatibility remediation |
P1139 | Address wording issues related to ISO 10646 |
P1041 | Make char16_t/char32_t string literals be UTF-16/32 |
P1025 | Update The Reference To The Unicode Standard |
P0645 | Text Formatting |
P0482 | char8_t: A type for UTF-8 characters and strings |
WG14 Number | Title/Notes/Links |
---|---|
N3366 | Restartable Functions for Efficient Character Conversions, r13 (Previously N2431 (R0), N2440 (R1), N2500 (R2), N2595 (R3), N2620 (R4), N2730 (R5), N2902 (R6), N2966 (R7), N2999 (R8), N3031 (R9), N3075 (R10), N3095 (R11), N3265 (R12)) |
N3145 | $ in Identifiers v2 (Previously N3046 (R0)) |
N3124 | Aligning Universal Character Names Constraints with C++ |
N3095 | |
N3016 | Unicode Length Modifiers v3 |
N2948 | Accessing the command line arguments outside of main() |
N2932 | C Identifier Security using Unicode Standard Annex 39 v2 (Previously N2916 (R0)) |
N2785 | Delimited escapes sequences |
WG14 Number | Title/Notes/Links |
---|---|
N2940 | Removing trigraphs??! |
N2939 | Identifier Syntax Fixes |
N2836 | C Identifier Syntax using Unicode Standard Annex 31 (Previously N2777 (R0)) |
N2828 | Unicode Sequences More Than 21 Bits are a Constraint Violation |
N2728 | char16_t & char32_t string literals shall be UTF-16 & UTF-32 | r0 |
N2701 | @ and $ in source and execution character set |
N2653 | char8_t: A type for UTF-8 characters and strings (Revision 1) (Previously N2231 (R0)) |
N2594 | Mixed Wide String Literal Concatenation |
N2563 | Character encoding of diagnostic text |
N2418 | Adding the u8 character prefix (Previously N2198 (R0)) |