nlohmann / json

JSON for Modern C++
https://json.nlohmann.me
MIT License
43.12k stars 6.73k forks source link

Parser does not read non ascii characters : ŞÜİĞ #4007

Closed UtkuBulkan closed 1 year ago

UtkuBulkan commented 1 year ago

Description

The parser does not read the following characters :

{
    "4bcfadb7-c345-474c-aaaf-65734b14f1e6": {
        "fps": 24,
        "text": {
            "4e50c16f-a944-4adc-9d78-74cca4ba1cb9": {
                "alpha": 1,
                "groups": "4e50c16f-a944-4adc-9d78-74cca4ba1cb9",
                "context": "A ŞĞÜİ B",

When I try to read the context field, the output is simply "A B", and the rest of the characters seems as they never existed in the first place.

I am using the following command to read the json field :

text = new std::string(txt["context"].get());`

What shall I do in order to read the non ascii characters ?

Reproduction steps

  1. Create a non ascii character json field in your json structure.
  2. Try to read the json field.
  3. You can see that ascii characters are not read.

Expected vs. actual results

Expected : The non ascii characters should be read.

Actual results:

The ascii characters seems to be omitted under all circumstances.

Minimal code example

text = new std::string(txt["context"].get<std::string>());

Error messages

There are no error messages.

The json parser just skips the non ascii characters as empty spaces.

Compiler and operating system

linux, debian, GCC 4.8 - 12.0

Library version

JSON for Modern C++ version 3.11.1

Validation

nlohmann commented 1 year ago

See https://json.nlohmann.me/home/faq/#parse-errors-reading-non-ascii-characters

UtkuBulkan commented 1 year ago

Ok.

How about reading from a json ?

text = new std::wstring(j["context"].get());

Would this ever work ?

It says expected string but recevied an array of numbers.

What shall I do here ?

UtkuBulkan commented 1 year ago

What does this exactly mean ?

std::u16string and std::u32string can be parsed, assuming UTF-16 and UTF-32 encoding, respectively. These encodings are not supported when reading from files or other input containers.

Does it mean that, we need to first read the context of a json file into json, then read from that source with u16string ?

UtkuBulkan commented 1 year ago

@nlohmann ,

Any suggestions here ?

This is the code that I have been using :

std::u16string* text;

try{ text = new std::u16string(txt["context"].get<std::u16string>()); } catch(json::type_error& te){ VI_FATAL("{}, {}, {}", te.what(), __FILE__, __LINE__); exit(0); }

And this is the error that I have gotten : [14:51:47] APP: [json.exception.type_error.302] type must be array, but is string, /app/src/coms/json_parser.cpp, 160

And here is the json :

{
    "4bcfadb7-c345-474c-aaaf-65734b14f1e6": {
        "fps": 24,
        "text": {
            "4e50c16f-a944-4adc-9d78-74cca4ba1cb9": {
                "alpha": 1,
                "groups": "4e50c16f-a944-4adc-9d78-74cca4ba1cb9",
                "context": "A ŞĞÜİ B",
nlohmann commented 1 year ago

The library only supports UTF-8. Other encodings will likely be treated as errors by the parser. Can you provide a minimal working example?

UtkuBulkan commented 1 year ago

This is the json :

{
    "4bcfadb7-c345-474c-aaaf-65734b14f1e6": {
        "fps": 24,
        "text": {
            "4e50c16f-a944-4adc-9d78-74cca4ba1cb9": {
                "alpha": 1,
                "groups": "4e50c16f-a944-4adc-9d78-74cca4ba1cb9",
                "context": "A ŞĞÜİ B",
            }
        }
    }
}

This is how I read the json file :

            std::ifstream project_json_input(manifest_file);
            Json = json::parse(project_json_input);

And this is how I read the field

try { 
    text = new std::wstring(txt["context"].get<std::wstring>()); 
}  catch(json::type_error& te) { 
    VI_FATAL("{}, {}, {}", te.what(), __FILE__, __LINE__); exit(0); 
}

I have accepted the reality of the library only supports UTF8 :

However, now I have found a solution as below :

The solution is to convert the wstring to utf8 number encoding, and then convert it back to wstring in backend.

nlohmann commented 1 year ago

Can you please add the JSON file as is - the encoding is important here.

Strings are stored as std::string internally. Assuming txt["context"] is valid and returns a basic_json&, then get<std::wstring>() is equivalent to a static_cast<std::wstring>(...) - that behavior (and whatever exceptions come out of this) is out of scope of this library.