llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.78k stars 10.97k forks source link

[libc++] No library support for `char8_t` if compiled without C++20 but with `-fchar8_t` #97601

Closed MitalAshok closed 1 day ago

MitalAshok commented 2 days ago

Currently, _LIBCPP_HAS_NO_CHAR8_T is always defined if not compiled with -std=c++20 or above:

https://github.com/llvm/llvm-project/blob/3ab2247d10673419609333a33bca0eca8a56bf3d/libcxx/include/__config#L702-L704

But both Clang and GCC allow char8_t in older standards with the -fchar8_t flag: https://godbolt.org/z/ePcrqvWsa

Someone specifying -fchar8_t presumably is going to use char8_t consistently even before C++20, so it makes sense to allow using library facilities for char8_t. One problem I can think of is that some names (like u8string) aren't reserved in C++17, but someone using -fchar8_t should know not to use them. Plus libstdc++ supports it.

ldionne commented 2 days ago

Libc++ generally strives to stick to the letter of the Standard as much as possible, since doing otherwise creates loads of problems. For example, proper interoperation with char8_t might require pulling in other features into previous standard modes, and that's not a path we want to go down.

So my first reaction to this would be that we don't want to support this, basically.

MitalAshok commented 2 days ago

Yeah that makes sense.

However, I don't think it would be too harmful to scale back to just add std::char_traits<char8_t>. It would be a standards-conforming extension as char8_t is an extension type, and users can use std::basic_string<char8_t>/std::basic_ostream<char8_t>/std::basic_string_view<char8_t> and be ABI compatible with C++20. std::char_traits<char8_t> is the only pain point.

(This might be a big implementation burden, and you don't want to end up in a situation like libstdc++ that supports std::basic_string<unsigned char> and std::basic_string<std::byte>)

There doesn't appear to be a big userbase for -std=c++17 -fchar8_t, so not really a big issue either way.

philnik777 commented 2 days ago

IIUC this is a flag to ease upgrading to C++20, so I don't really see why we'd want to support -fchar8_t in previous language modes. To upgrade you can use -std=c++20 -fno-char8_t, but why would you want e.g. -std=c++17 -fchar8_t? @AaronBallman do you have any thoughts on this?

philnik777 commented 2 days ago

(BTW this is also a case where https://discourse.llvm.org/t/rfc-require-discussion-of-impact-to-monorepo-stakeholders-when-adding-new-clang-extensions/79613 would have been a good idea - thanks for adding that requirement Aaron!)

MitalAshok commented 2 days ago

It appears something similar was being done here: https://reviews.llvm.org/D92212, which was abandoned.

Now I doubt -std=c++17 -fchar_t is intentional. It was added in 3a8244df6fb88a6670470e603445c72f224db9e3 and only says -fchar8_t is basically the way to get -std=c++20. #55373 talks about -std=c18 -fchar8_t and -std=c23 -fchar8_t, which seem likewise unintentional and might be disabled in the future. I guess it wouldn't make sense to support an unintentional extension.

ldionne commented 1 day ago

Thanks for the discussion folks! Tentatively closing since the consensus appears to be that this isn't worth doing, but please reopen if I misread this or if more discussion is needed.