open-source-parsers / jsoncpp

A C++ library for interacting with JSON.
Other
8.14k stars 2.64k forks source link

doesAnyCharRequireEscaping() conflicts with UTF-8 #1551

Open janhec opened 2 months ago

janhec commented 2 months ago

toStyledString() produces \u escapes for UTF-8 characters such as ï. This is caused by escaping on the ground that c > 0x7F in doesAnyCharRequireEscaping(), called from valueToQuotedStringN() from BuiltStyledStreamWriter::writeValue(). The last part of the condition imo should be removed; After removing c > 0x7F, I get normal UTF-8 which I need and is imo in line with the specs.

BillyDonahue commented 2 months ago

Can you clarify what code you're referring to with some links (or pasted code, or both) ? It's difficult to follow the description as-is. Thx.

janhec commented 2 months ago

json_writer.cpp:180. I commented out the original body of the lambda and replaced it with a shorter one, because UTF-8 was getting escaped. This change is only useful in case of UTF-8 in the json text, and a desire to keep things that way, so probably this should be done with more nuance than this. Anyway, it helped in my specific case.

static bool doesAnyCharRequireEscaping(char const* s, size_t n) {
  assert(s || !n);

  return std::any_of(s, s + n, [](unsigned char c) {
    //return c == '\\' || c == '"' || c < 0x20 || c > 0x7F; // c > 0x7F conflicts with UTF-8
    return c == '\\' || c == '"' || c < 0x20;
  });
}