sg16-unicode / sg16

SG16 overview and general information
46 stars 5 forks source link

Specify that char16_t and char32_t literals are UTF-16 and UTF-32 respectively #6

Closed tahonermann closed 4 years ago

tahonermann commented 6 years ago

The C and C++ standards do not currently specify that the encoding of char16_t and char32_t literals are respectively UTF-16 and UTF-32. C states that they are only if the corresponding STDC_UTF_16 or STDC_UTF_32 macro is defined to 1 (6.10.8.2, "Environment macros"). Various parts of the C++ standard (codecvt and char_traits) refer to UTF-16/UTF-32 thereby admitting a bias towards these encodings despite lack of strict specification.

It may be that, in practice, all C and C++ compilers that are being updated to conform to new standards, are only using UTF-16 and UTF-32 for these literals. If so, the standards can be updated to mandate the use of these encodings.

tahonermann commented 6 years ago

Status of this issue: P1041R1 appeared in the post-Rapperswil mailing. With a little luck, it will be presented to EWG in San Diego.

ThePhD commented 5 years ago

Luck achieved: this was presented in San Diego with a presentation from @tahonermann.

It is on track for C++20. We will need to bring it through EWG, but no resistance to this is anticipated.

ThePhD commented 5 years ago

Side note: it was decided that for the __STD_C... macros related to UTF16/32, those could be handled as editorial / Defect Reports in Core, and did not have to be necessarily apart of this proposal.

tahonermann commented 5 years ago

@martinho, we should get an updated revision of this paper in the pre-Kona mailing (after November 26th, but before January 21st). Some things to update:

With regard to those macros, we actually had references to them in C++14 (in the <cuchar> synopsis), but lost them along the way to C++17 (we now just defer to C for the contents of the <cuchar> header). I don't think we need to add them back unless we're going to state that implementations must define them. That concern can be handled as a core issue after getting approval as JeanHeyd already mentioned, but we might be able to short cut that core issue processing by adding the macro requirements to the paper. I have a slight preference toward doing the latter.

For reference: gcc and clang both define __STDC_UTF_16__ and __STDC_UTF_32__ in both C and C++ compilation modes. MSVC never defines them. I asked Jonathan Caves about it and he stated it was probably just an oversight.

tahonermann commented 5 years ago

Oh, hey, there are a few relevant core issues:

tahonermann commented 4 years ago

This issue was resolved by the adoption of P1041R4 in Kona. Closing.