w3c / qtspecs

XSLT and XQuery Specifications - the source used to build the specs, and the errata
Other
30 stars 25 forks source link

JSON output mode's escaping rules requires escaping of "/" #32

Open reschke opened 2 years ago

reschke commented 2 years ago

From https://www.w3.org/TR/xslt-xquery-serialization-31/#json-output:

Any other character in the input string (but not a character produced by character mapping) is a candidate for Unicode Normalization if requested by the normalization-form parameter, and JSON escaping. JSON escaping replaces the characters quotation mark, backspace, form-feed, newline, carriage return, tab, reverse solidus, or solidus by the corresponding JSON escape sequences \", \b, \f, \n, \r, \t, \, or \/ respectively, and any other codepoint in the range 1-31 or 127-159 by an escape in the form \uHHHH where HHHH is the hexadecimal representation of the codepoint value. Escaping is also applied to any characters that cannot be represented in the selected encoding.

What's the point in escaping the solidus character?

michaelhkay commented 2 years ago

It apparently avoids problems when embedding JSON in an HTML script element, as described here: https://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped

I don't know if the rationale is sound, but the WG considered the question and made this decision.

reschke commented 2 years ago

Ideas:

michaelhkay commented 2 years ago

In serialization, there's a rather clunky way of preventing the escaping of "/" - define a character map that maps "/" to "/".

The question of escaping of "/" also arises with the xml-to-json function. Here there's an oddity in the spec (which is probably an oversight). If escaped[-key]="true", then an unescaped solidus is replaced by \/. But in the absence of escaped[-key]="true", an unescaped solidus is left unchanged. I can't see any reason for the inconsistency, and I assume it wasn't intended.

So what should we do about it? An option to control this one little detail seems heavy-handed. If we're going to add an option, should we make it more powerful, for example a user-defined callback function that handles all escaping, or a regular expression that matches characters to be escaped?