sheredom / json.h

🗄️ single header json parser for C and C++
The Unlicense
712 stars 79 forks source link

JSON standard wants to escape / #3

Closed Smilex closed 8 years ago

Smilex commented 9 years ago

The JSON standard allows for escaping the / character. The software "Tiled" does this when you output JSON data, so using json.h to read those causes errors, because strings (like paths) look like "..\/..\/some\/file". Ignoring the '\' in this case works fine

sheredom commented 9 years ago

So I'm having difficulty reproducing your failure specifically - any chance you could fling me an example output that failed?

static int test_tiled_example() {
  const char payload[] = "{ \"path\" : \"..\\/..\\/some\\/file\"}";

  struct json_value_s* value = json_parse(payload, strlen(payload));

  struct json_object_s* object = value->payload;

  printf("%s : %s\n", object->start->name->string,
    ((struct json_string_s* )(object->start->value->payload))->string);

  return 0;
}

The above is the example I tried to find your error. JSON being a string format does allow a user to escape '\' a number of characters (the solidus '/' being one of them), but given that it is a string format, I would expect two characters to be used in the input file for the escaping.

I even tried;

  const char payload[] = "{ \"path\" : \"..\/..\/some\/file\"}";

With the same correct result.

sheredom commented 8 years ago

Without an example I'm really struggling to figure out what the problem was here. I did bring in a fix by @ocornut that might have been related to this issue though?

I'm going to close this issue, please re-open or re-file an issue if the problem persists!

ocornut commented 8 years ago

I think what @Smilex mean is that he expect json to remove the \ on parsing and translate character, which it currently doesn't (which I find odd myself), in json_parse_string:

    if ('\\' == state->src[state->offset]) {
      // copy reverse solidus character
      state->data[size++] = state->src[state->offset++];
    }
    state->data[size++] = state->src[state->offset++];

so currently you'd get "\n" (two chars, '\' and 'n') instead of "\n" (resolved to one char '\n')

sheredom commented 8 years ago

I guess my worry here is what should I do if I had;

"foo\n "

EG. when parsed currently into a json_string_s it would be a string that read "foo\n\n"

If I was to parse the \n -> \n, and also allow the \n to be kept, it would mean when I write the string back out using one of the write functions it would be reduced to;

"foo

"

which wouldn't be the same as the input.

ocornut commented 8 years ago

You can't guarantee the exact same json file after a read and write, but the data held will be the same. In your example the

"foo\n "

When read and written back will be turned into either

"foo\n\n"

or

"foo

"

Not sure to see the problem

sheredom commented 8 years ago

I mean even if we take the example that the bug orginally concerns, escaping the '/' char.

Would it be acceptable that we turn "\/" from the input string -> "/" as stored in the json_string_s -> "/" as stored in the output string?

If this is ok then I'm happy to actually do the work here :)

ocornut commented 8 years ago

I think it is OK. I'm not sure why tiled escaped the forward slashes.

ocornut commented 8 years ago

I suppose it is anologuous to storing extra spaces in the source json file, you can't preserve them on write (or you could but it would be a different json library).

sheredom commented 8 years ago

Ok - sounds good! I'll leave it on the todo then :)

sheredom commented 8 years ago

I've added an option in commit https://github.com/sheredom/json.h/commit/03c9e8d2497479db05eb78fe6f3619b9880223d7 - called json_parse_flags_allow_string_simplification that will allow you to turn sequences like \/ into /.

ocornut commented 8 years ago

Thanks! (Haven't tried yet, as per my other message probably won't until some time)

Not sure if json_parse_flags_allow_string_simplification is a right name for that (the "allow" part could be an "apply"), neither if it makes much sense to ever have it disabled? AFAIK handling the \ special character is typically the job of a parser. I would suggest inverting it to hmm say:json_parse_flags_disable_escaping

Again haven't used it yet so this is all theoretical :)

sheredom commented 8 years ago

Yeah I'm not 100% convinced either way yet of what the correct behaviour is - so I thought I'd add the ability to enable what you want and then do the actual thinking later ;)