mpx / lua-cjson

Lua CJSON is a fast JSON encoding/parsing module for Lua
https://kyne.au/~mark/software/lua-cjson.php
MIT License
924 stars 474 forks source link

Avoid escaping forward slash #57

Open schnell18 opened 6 years ago

schnell18 commented 6 years ago

Forward slash may be escaped according to the JSON specification. In practice, escaped forward slash is not interpreted properly by most applications that consume JSON. For instance, it's very common to embed http(s) link in the JSON file. Using cjson will breaks these applications. So please consider accept this pull request to make cjson a good fit for these applications.

Thanks in advance!

MorseWayne commented 4 years ago

this is really helpful to solve the foward slash escape problem, i used your code and compiled my cjson.so, now everything is ok!

ktalebian commented 4 years ago

@mpx can this pull request be merged? This is regarding this issue https://github.com/mpx/lua-cjson/issues/66.

mpx commented 3 years ago

Can you please provide some example decoders that fail with an escaped slash? Or maybe significant APIs? Just curious - I'd like to understand the impact better.

Conversely, is anyone aware of a need to escape / in some cases? It appears escaping slash is not particularly common in other JSON encoders. Perhaps Lua CJSON is an outlier supporting escaped forward slash?

If so, it might be better to stop escaping forward slash. If there is a clearly better choice we should do that.

tai-x commented 2 years ago

@mpx Here some examples:

    local t = {
      some_secure_token = 'P9JUJs/li1tZAAzjlveIBLROlLMkQrzxMzvZmBP/tDE='
    }
    print(require('inspect').inspect(cjson.encode(t)))

This prints:

'{"token":"P9JUJs\\/li1tZAAzjlveIBLROlLMkQrzxMzvZmBP\\/tDE="}'

In #66 @ktalebian mentioned something similar with an url like:

cjson.encode({url =  "https://google.com"})

Resulting in:

{"url":"https:\/\/google.com"}
mpx commented 2 years ago

@tai-x Yes, forward slashes are currently escaped - this is valid JSON.

I'm particularly interested to understand:

  1. What breaks? Eg, Which decoders do not handle escaped forward slashes? Which APIs break when foward slashes are escaped?
  2. Is anyone aware of situations where something might break if forward slashes were never escaped?

In theory, either escaping or not escaping should be fine in all cases according to the spec. I originally escaped forward slash since it seemed to be the potentially safer choice (but I have no evidence this is true). Apparently escaping is uncommon?

In practice, this bug reports that escaping can break some applications. I'd like to confirm what specifically breaks (details are missing from this bug).

I'd prefer to remove forward slash escapes if escaping causes significant breakage somewhere and leaving it unescaped is generally fine. CJSON is already too configurable in places.

tai-x commented 2 years ago

Since its valid JSON it probably doesn't break anything. Nonetheless I think when you make a library call like this one (cjson.encode), you would expect as less modification as possible (escape only when the language or so needs it). Escaping the forward slash is not necessary and it is modifying the field. If it's not necessary, what do you gain with mangling the value.

So it's not a encode/decode issue (that's probably fine). It's an encode/then use the result issue. For my particular case, for instance, I'm running a unit test where I have a table that will be compared against a JSON api response. Encoding that table to compare both results on json can't be performed "as is" since forward slashes in the values get escaped, so the assertion fails. I'm manually generating the json now with regular string concatenation, from that table.

If #76 was reported, someone put on a PR (#57), and there are stack overflow questions about this, it means there are many persons seeing this issue, because very few people interact on this tools.

kljensen commented 2 years ago

I agree with @tai-x , since escaping forward slashes is permitted but not required, it is preferable to not do it. I believe also that the most common behavior is to not escape forward slashes. (Maybe the escaping of forward slashes is most useful when json is embedded in HTML script tags?) Of course, @mpx has a valid point---maybe users of lua-cjson are already depending upon this behavior and, if so, changing could add a headache.

tai-x commented 2 years ago

@kljensen Backwards compatibility is a very important concern, we should assume that this would be a breaking change for someone. To deal with that, I guess MAJOR in semantic versioning should express that this is a breaking change.

humanoid2050 commented 2 years ago

I concur with this suggestion. Even if the current behavior of escaping forward slash works correctly in most (or even all) instances, it's a bit odd to suddenly see backslashes that were not present in your original data before encoding.