Closed tahonermann closed 4 years ago
Status of this issue: P1041R1 appeared in the post-Rapperswil mailing. With a little luck, it will be presented to EWG in San Diego.
Luck achieved: this was presented in San Diego with a presentation from @tahonermann.
It is on track for C++20. We will need to bring it through EWG, but no resistance to this is anticipated.
Side note: it was decided that for the __STD_C...
macros related to UTF16/32, those could be handled as editorial / Defect Reports in Core, and did not have to be necessarily apart of this proposal.
@martinho, we should get an updated revision of this paper in the pre-Kona mailing (after November 26th, but before January 21st). Some things to update:
__STDC_UTF_16__
and __STDC_UTF_32__
C macros as indication of why this paper is evolutionary and not just a core issue. The macros indicate the intent that, originally, the encoding was intended to be implementation defined.With regard to those macros, we actually had references to them in C++14 (in the <cuchar>
synopsis), but lost them along the way to C++17 (we now just defer to C for the contents of the <cuchar>
header). I don't think we need to add them back unless we're going to state that implementations must define them. That concern can be handled as a core issue after getting approval as JeanHeyd already mentioned, but we might be able to short cut that core issue processing by adding the macro requirements to the paper. I have a slight preference toward doing the latter.
For reference: gcc and clang both define __STDC_UTF_16__
and __STDC_UTF_32__
in both C and C++ compilation modes. MSVC never defines them. I asked Jonathan Caves about it and he stated it was probably just an oversight.
This issue was resolved by the adoption of P1041R4 in Kona. Closing.
The C and C++ standards do not currently specify that the encoding of
char16_t
andchar32_t
literals are respectively UTF-16 and UTF-32. C states that they are only if the correspondingSTDC_UTF_16
orSTDC_UTF_32
macro is defined to 1 (6.10.8.2, "Environment macros"). Various parts of the C++ standard (codecvt
andchar_traits
) refer to UTF-16/UTF-32 thereby admitting a bias towards these encodings despite lack of strict specification.It may be that, in practice, all C and C++ compilers that are being updated to conform to new standards, are only using UTF-16 and UTF-32 for these literals. If so, the standards can be updated to mandate the use of these encodings.