sheredom / json.h

🗄️ single header json parser for C and C++
The Unlicense
698 stars 77 forks source link

Test Case: json_write_pretty() with json string \u012b writes garbage #65

Closed guidotex closed 3 years ago

guidotex commented 4 years ago

I've been using your library for a while - very nice. Thank you for the effort to build it! I recently ran into an odd JSON string with a unicode value that looks like valid json to me (based on description of strings at www.json.org). 'json.c' parses it without error, but prints garbage due to the escape character (see program below for test case).

I ran this string through my vim json parser: PASS and pretty_prints correctly I ran it through the parser at jsonlint.org: PASS and pretty_prints correctly I run it through json.c: PASS... but it won't print correctly

Compiling on Ubuntu 12.04, g++ version 4.6.4

int main(void)
{
   const char payload[] = "{\"key1\":\"value1\", \"key2\":\"\\u012b\"}"; 

   printf("\nPrinting Payload:\n  %s\n\n", payload); 
   printf("\nProcessing Payload in JSON parser...\n");
   struct json_value_s *value = json_parse(payload, strlen(payload));
   void* tmp = json_write_pretty(value, "  ", "\n", 0); 

   if(tmp) {
       printf("\nPretty Payload:\n%s\n\n", (char*) tmp);
   }
   return 0; 
}

image

I think the solution is how to handle the escape character in 'json_parse_string()'...

I'm stumped!

sheredom commented 4 years ago

So I finally got time to look at this - and this is sorta by design - basically when I parse the \u012b character, json.h is turning it into the actual unicode character that that represents. This is a difficult problem unfortunately - because people wanted me to turn \n into the actual newline character, it made sense to do the same with \u. But it would be perfectly valid to have ī\u012b in a string, and then I'd either have to turn it into \u012b\u012b or īī!

guidotex commented 4 years ago

That makes complete sense. :)

In the example program, the {"key2", "value2"} pair has an escape in front of the \u, so it is coded specifically as "\u012b".

Would there be any undesirable results from looking for the double escape before '\u' to recognize this as string rather than a character?

Your thoughts?

sheredom commented 3 years ago

I don't think there is anything I can do for this one (sorry for the delay, COVID got in the way).

Thanks for the interest in my library!