sg16-unicode / sg16

SG16 overview and general information
45 stars 5 forks source link

Unicode 16: Updates needed for [format.string.std]p13 field widths #81

Open tahonermann opened 3 months ago

tahonermann commented 3 months ago

Assuming adoption of CWG 2843 (Undated reference to Unicode makes C++ a moving target), C++23 (and C++26) will have a normative reference to Unicode 15.1. Recent changes made for Unicode 16.0 will require updates to [format.string.std]p13 for alignment when the C++ standard normative reference is changed to Unicode 16.0 or later. That paragraph currently states:

For a sequence of characters in UTF-8, UTF-16, or UTF-32, an implementation should use as its
field width the sum of the field widths of the first code point of each extended grapheme cluster.
Extended grapheme clusters are defined by UAX #29 of the Unicode Standard. The following
code points have a field width of 2:
- (13.1) any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property
  as described by UAX #44 of the Unicode Standard
- (13.2) U+4DC0 – U+4DFF (Yijing Hexagram Symbols)
- (13.3) U+1F300 – U+1f5ff (Miscellaneous Symbols and Pictographs)
- (13.4) U+1F900 – U+1f9ff (Supplemental Symbols and Pictographs)
The field width of all other code points is 1.

Consensus item 179-C18 in L2/24-061 (Minutes of UTC Meeting 179) records consensus to accept recommendation 37 from the CJK & Unihan Working Group as recorded in L2/24-067 (CJK & Unihan Working Group Recommendations for UTC #179 Meeting) to adopt the following proposals:

The effect of these changes is that the set of code points included in bullet 13.2 will be subsumed by those in bullet 13.1. This issue can therefore be resolved by striking bullet 13.2 when the C++ standard is rebased on Unicode 16.0 or later.