WG21 SG16 Unicode study group

SG16 is an ISO/IEC JTC1/SC22/WG21 C++ study group tasked with improving Unicode and text processing support within the C++ standard.

If you would like to contribute to the discussion, please subcribe to our mailing list at https://lists.isocpp.org/mailman/listinfo.cgi/sg16.

Meetings are generally held twice a month; invitations are sent to the mailing list. Summaries of past meetings are available at https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md.

A standing paper that describes our intended scope, directives, guidelines and constraints is available at P1238 - SG16: Unicode Direction. Anyone wanting to follow or contribute to SG16 should become familiar with it.

We also provide input on other proposals within WG21 and WG14 when those proposals touch on topics listed in P1253 - Guidelines for when a WG21 proposal should be reviewed by SG16.

The following sections list projects, Unicode papers, and ISO papers that fall under the purview of SG16.

Active Projects

Project	Description/Links
Boost.Text	What a c++ standard Unicode library might look like Code repository Documentation
ztd.text	The premiere library for handling text in different encoding forms and reducing transcoding bugs in your C++ software Code repository Documentation
text_view	A C++ Concepts based character encoding and code point enumeration library Code repository

Unicode papers

Document Number	Title/Notes/Links
L2/23-153	Opposition to and Comment on L2/23–107
L2/23-107	Proper Complex Script Support in Text Terminals
L2/21-038	Clarify guidance for use of a BOM as a UTF-8 encoding signature

ISO/IEC JTC1/SC2/WG2 (Unicode) Papers

Active Papers

WG21 Number	Title/Notes/Links
WG2-N5168	Name aliases and UTF-16 encoding scheme are inconsistent with the Unicode Standard Per WG2-N5175, WG2-N5174 contains the proposed resolution.
WG2-N5174	Proposed changes concerning Character Name Aliases in ISO/IEC 10646 This is the proposed resolution for WG2-N5168.

ISO/IEC JTC1/SC22/WG21 (C++) Papers

Active Papers

WG21 Number	Title/Notes/Links
P3374	Adding formatter for fpos
P3364	Remove Deprecated u8path overloads From C++26
P3263	Encoding annotated char
P3258	Formatting of charN_t
P3154	Deprecating signed character types in iostreams
P3070	Formatting enums
P2873	Remove Deprecated Locale Category Facets For Unicode from C++26
P2758	Emitting messages at compile time
P2749	Down with ”character”
P2729	Unicode in the Library, Part 2: Normalization
P2728	Unicode in the Library, Part 1: UTF Transcoding
P2626	charN_t incremental adoption: Casting pointers of UTF character types
P2528	C++ Identifier Security using Unicode Standard Annex 39
P2348	Whitespaces Wording Revamp
P2319	Prevent path presentation problems
P1953	Unicode Identifiers And Reflection
P1729	Text Parsing
P1629	Standard Text Encoding
P1628	Unicode character properties
P1030	std::filesystem::path_view
P0244	Text_view: A C++ concepts and range based character encoding and code point enumeration library

Accepted C++26 Papers

WG21 Number	Title/Notes/Links
P2909	Fix formatting of code units as integers (Dude, where’s my char?)
P2872	Remove wstring_convert From C++26
P2871	Remove Deprecated Unicode Conversion Facets From C++26
P2845	Formatting of std::filesystem::path
P2741	user-generated static_assert messages
P2558	Add @, $, and ` to the basic character set
P2361	Unevaluated strings literals
P1885	Naming Text Encodings to Demystify Them
P1854	Conversion to execution encoding should not lead to loss of meaning

Accepted C++23 Papers

WG21 Number	Title/Notes/Links
P2736	Referencing the Unicode Standard
P2713	Escaping improvements in std::format
P2693	Formatting thread::id and stacktrace
P2675	LWG3780: The Paper (format's width estimation is too approximate and not forward compatible)
P2653	Update Annex E based on Unicode 15.0 UAX 31
P2572	std::format() fill character allowances
P2513	char8_t Compatibility and Portability Fixes
P2460	Relax requirements on wchar_t to match existing practices
P2419	Clarify handling of encodings in localized formatting of chrono types
P2372	Fixing locale handling in chrono formatters
P2362	Remove non-encodable wide character literals and multicharacter wide character literals
P2316	Consistent character literal encoding
P2314	Character sets and encodings
P2295	Support for UTF-8 as a portable source file encoding
P2290	Delimited escapes sequences
P2246	Character encoding of diagnostic text
P2223	Trimming whitespaces before line splicing
P2201	Mixed string literal concatenation
P2093	Formatted output
P2071	Named universal character escapes
P2029	Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals
P1949	C++ Identifier Syntax using Unicode Standard Annex 31
P1072	basic_string::resize_and_overwrite

Accepted C++20 Papers

WG21 Number	Title/Notes/Links
P1892	Extended locale-specific presentation specifiers for std::format
P1868	🦄 width: clarifying units of width and precision in std::format
P1423	char8_t backward compatibility remediation
P1139	Address wording issues related to ISO 10646
P1041	Make char16_t/char32_t string literals be UTF-16/32
P1025	Update The Reference To The Unicode Standard
P0645	Text Formatting
P0482	char8_t: A type for UTF-8 characters and strings

Inactive Papers

Inactive papers list

The following papers are no longer being pursued. WG21 Number | Title/Notes/Links ------------- | ----- ~~[P2773][]~~ | ~~Considerations for Unicode algorithms~~
(This is an informational paper and was reviewed by SG16 in February and March of 2023) ~~[P2498][]~~ | ~~Forward compatibility of text\_encoding with additional encoding registries~~
(Dropped by the author following lack of consensus for a change in LEWG) ~~[P2491][]~~ | ~~Text encodings follow-up~~
(The concerns raised in this paper were avoided by changes made in R10 of [P1885][]) ~~[P2297][]~~ | ~~Wording improvements for encodings and character sets~~
(The goals of this paper were mostly addressed via [P2314][]) ~~[P2194][]~~ | ~~The character set of C++ source code is Unicode~~
(The goals of this paper are now being pursued via [P2314][] and [P2297][]) ~~[P2178][]~~ | ~~Misc lexing and string handling improvements~~
(The goals of this paper are now being pursued via [P1854][], [P2223][], [P2295][], [P2297][], [P2348][], [P2316][], [P2361][], [P2362][], and [P2460][]) ~~[P2020][]~~ | ~~Locales, Encodings and Unicode~~
(This paper did not contain a concrete proposal and no revisions are expected; it will be used as reference material) ~~[P1880][]~~ | ~~uNstring Arguments Shall Be UTF-N Encoded~~
(This proposal was withdrawn by the author upon determining that the complexity of the required wording updates would outweigh their benefits) ~~[P1879][]~~ | ~~Please Don't Rewrite My String Literals~~
(This proposal was withdrawn by the author) ~~[P1859][]~~ | ~~Standard terminology for execution character set encodings~~
(The goals of this proposal were accomplished via [P2314][]) ~~[P1844][]~~ | ~~Enhancement of regex~~
(Severe ABI concerns prevent updating `std::regex`. We will explore deprecating and replacing it) ~~[P1097][]~~ | ~~Named character escapes~~
(Superseded by P2071) ~~[P0353][]~~ | ~~Unicode Friendly Encoding Conversions for the Standard Library~~
(This proposal is not being advocated at this time; more foundational concerns need to be addressed first) ~~[P0169][]~~ | ~~regex with Unicode character types~~
(This proposal is not being advocated at this time; more foundational concerns need to be addressed first)

ISO/IEC JTC1/SC22/WG14 (C) Papers

Active Papers

WG14 Number	Title/Notes/Links
N3366	Restartable Functions for Efficient Character Conversions, r13 (Previously N2431 (R0), N2440 (R1), N2500 (R2), N2595 (R3), N2620 (R4), N2730 (R5), N2902 (R6), N2966 (R7), N2999 (R8), N3031 (R9), N3075 (R10), N3095 (R11), N3265 (R12))
N3145	$ in Identifiers v2 (Previously N3046 (R0))
N3124	Aligning Universal Character Names Constraints with C++
N3095
N3016	Unicode Length Modifiers v3
N2948	Accessing the command line arguments outside of main()
N2932	C Identifier Security using Unicode Standard Annex 39 v2 (Previously N2916 (R0))
N2785	Delimited escapes sequences

Accepted C23 Papers

WG14 Number	Title/Notes/Links
N2940	Removing trigraphs??!
N2939	Identifier Syntax Fixes
N2836	C Identifier Syntax using Unicode Standard Annex 31 (Previously N2777 (R0))
N2828	Unicode Sequences More Than 21 Bits are a Constraint Violation
N2728	char16_t & char32_t string literals shall be UTF-16 & UTF-32 \| r0
N2701	@ and $ in source and execution character set
N2653	char8_t: A type for UTF-8 characters and strings (Revision 1) (Previously N2231 (R0))
N2594	Mixed Wide String Literal Concatenation
N2563	Character encoding of diagnostic text
N2418	Adding the u8 character prefix (Previously N2198 (R0))

Inactive Papers

Inactive papers list

WG14 Number | Title/Notes/Links ------------- | ----- ~~[N3265][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r12~~
(Superseded by [N3366][]) ~~[N3095][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r11~~
(Superseded by [N3265][]) ~~[N3075][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r10~~
(Superseded by [N3095][]) ~~[N3046][]~~ | ~~$ in Identifiers~~
(Superseded by [N3145][]) ~~[N3031][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r9~~
(Superseded by [N3075][]) ~~[N2999][]~~ | ~~Restartable for Efficient Character Conversions \| r8~~
(Superseded by [N3031][]) ~~[N2983][]~~ | ~~Unicode Length Modifiers v2~~
(Superseded by [N3016][]) ~~[N2966][]~~ | ~~Restartable Functions for Efficient Character Conversions \| r7~~
(Superseded by [N2999][]) ~~[N2916][]~~ | ~~C Identifier Security using Unicode Standard Annex 39~~
Superseded by [N2932][]) ~~[N2902][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r6~~
(Superseded by [N2966][]) ~~[N2875][]~~ | ~~Unicode Length Modifiers~~
(Superseded by [N2983][]) ~~[N2777][]~~ | ~~C Identifier Syntax using Unicode Standard Annex 31~~
(Superseded by [N2836][]) ~~[N2730][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r5~~
(Superseded by [N2902][]) ~~[N2620][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r4~~
(Superseded by [N2730][]) ~~[N2595][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r4~~
(Superseded by [N2500][]) ~~[N2500][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r2~~
(Superseded by [N2595][]) ~~[N2440][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions \| r1~~
(Superseded by [N2500][]) ~~[N2431][]~~ | ~~Restartable and Non-Restartable Functions for Efficient Character Conversions~~
(Superseded by [N2440][]) ~~[N2231][]~~ | ~~char8\_t: A type for UTF-8 characters and strings~~
(Superseded by [N2653][]) ~~[N2198][]~~ | ~~Adding the u8 character prefix~~
(Superseded by [N2418][])

sg16-unicode / sg16

readme

WG21 SG16 Unicode study group

Active Projects

Unicode papers

ISO/IEC JTC1/SC2/WG2 (Unicode) Papers

Active Papers

ISO/IEC JTC1/SC22/WG21 (C++) Papers

Active Papers

Accepted C++26 Papers

Accepted C++23 Papers

Accepted C++20 Papers

Inactive Papers

ISO/IEC JTC1/SC22/WG14 (C) Papers

Active Papers

Accepted C23 Papers

Inactive Papers