stephenberry / glaze

Extremely fast, in memory, JSON and interface library for modern C++
MIT License
1.23k stars 123 forks source link

[Question] Support for non-UTF-8 values? #1162

Closed DUOLabs333 closed 4 months ago

DUOLabs333 commented 4 months ago

If I have a string made up of raw chars, with no effort made to escape them, will glaze be able to serialize/parse them?

stephenberry commented 4 months ago

If you're writing your strings in C/C++ or most any text editor then you probably have valid UTF-8, unless you are adding invisible control characters.

JSON strictly requires UTF-8, so Glaze will reject illegal strings, such as strings that contain null or control characters in the middle of them. These can be written as escaped unicode \u, but this is typically dangerous and prone to error in C++ and other languages because it can result in hidden null characters in types like std::string and will break a lot of C string algorithms like strnlen

We are planning to add a compile time option to automatically unicode escape invalid UTF-8. The open issue is here #812. But, this is not recommended for general use.

What is your use case for non UTF-8 strings? Are you expecting invisible control characters in your strings?

In summary, Glaze does not unicode escape invalid UTF-8 when writing to ensure performance, but Glaze does ensure that the strings written will trigger a read error by any conforming JSON parser. If any JSON library is able to parse what you are writing, then you know that you're good to go.

DUOLabs333 commented 4 months ago

I'm writing a Vulkan driver in C++, and some commands/structs allow using a void pointer to hold arbitrary data. Since I'm sending the data over a network, I need to be able to serialize it.

However, now that I think about it, I probably should use std::vector for those fields over std::string, right?

stephenberry commented 4 months ago

Absolutely, arbitrary data like this is best in a std::vector<uint8_t> or std::vector<std::byte>.

I'll note that the same goes for if you use the binary format BEVE with Glaze.