Open lemire opened 1 year ago
cc @NicolasJiaxin
I'm guessing we also want to provide a std::ranges
-based API with lazy evaluation. For instance, assuming a compiler that encodes string literals as UTF-8, we want the following to work:
static_assert(std::ranges::equal("$£Иह€한𐍈" | utf8::views::decode, std::array{
0x00000024, 0x000000a3, 0x00000418, 0x00000939, 0x000020ac, 0x0000d55c, 0x00010348, 0x00000000}));
The previous static_assert
also assumes that the whole implementation is constexpr
, which would be nice too, I guess.
Starting with C++11, we have a full range of specialized string classes... E.g., std::u8string, std::u16string... std::u8string_view, and so forth. Strictly speaking they were introduced with C++11 (for std::string) and C++17 (for std::string_view) but std::u8string became available with C++20.
We could use std::string, assuming that it is UTF-8, but it might also use other encodings. If we are explicit that we are assuming UTF-8 then it is ok.
What we could do is to provide conversion functions. That might be helpful to some...?
The objective would be to improve quality of life for users who prefer not to mess with pointers.
References:
https://en.cppreference.com/w/cpp/string/basic_string_view https://en.cppreference.com/w/cpp/string/basic_string