w3c / qt3tests

Tests for XPath and XQuery
27 stars 17 forks source link

parse-json escaping rules (fn/parse-json.xml fn-parse-json-028) #65

Open faassen opened 2 weeks ago

faassen commented 2 weeks ago

The specification says:

https://www.w3.org/TR/xpath-functions-31/#func-parse-json

The effect of the one-argument form of this function is the same as calling the two-argument form with an empty map as the value of the $options argument.

And the escape parameter says Default: true

I read this as saying that if parse-json is called without a map as second parameter JSON escaping is to take place.

But parse-json-028 to parse-json-031 all seem to assume that escape is false:

      <test>parse-json('["\\"]')</test>
      <result>
         <all-of><assert>$result?1 = "\"</assert>
         <assert>$result?1 instance of xs:string</assert></all-of>
      </result>

\\ in JSON is \. If escape is set to true, then JSON escape sequences should appear in the result. There are tests for this when it's set explicitly to true, such as fn-parse-json-035.

The tests assume either a different default for escape than the specification mandates, or the default is different for the single parameter form of parse-json.

P.S. The specification assumes rather particular things about JSON handling. The Rust json crate is a reasonable implementation of the spec, but it makes liberal meaningless, makes duplicates impossible to implement correctly (use-last is always in use, which the JSON RFC says is a common compatible interpretation), and JSON escaping rules are handled so that by the time I get a string, I need to re-escape it again in order to follow this specification. It also mystifies me why escaped JSON characters in the item result should ever be desirable.

michaelhkay commented 2 weeks ago

You're right. I see that my implementation, which passes these tests, is using escape="false" as the default. I'm not sure what to do about it: I'll raise it with the QT4 community group, but we can't do anything about XPath 3.1.

It also mystifies me why escaped JSON characters in the item result should ever be desirable.

Because JSON allows characters that can't validly appear in XML. (However, in 4.0 we've changed the rules for the xs:string data type so that, in effect, invalid XML characters are now allowed. This makes importing data from non-XML sources much tidier, and no doubt parse-json() would have been defined differently if that had been done earlier.)

The Rust json crate is a reasonable implementation of the spec

I think the WG was aware when it wrote the spec for parse-json() that it imposed requirements which some off-the-shelf JSON libraries could not meet. That kind of thing is always a matter of heated debate, for example it has come up with regular expression processing and more recently with HTML parsing. The tendency in the working groups has been to set a high bar for conformance, recognising that this may cause implementors some headaches on occasions, and that some of them might choose to release non-conformant implementations. In my implementation, we decided to write our own JSON parser, which isn't really that difficult.

michaelhkay commented 2 weeks ago

I raised the issue at https://github.com/qt4cg/qtspecs/issues/1555 -- please feel welcome to join in the discussion.

michaelhkay commented 2 weeks ago

For what it's worth, an earlier draft of XPath 3.1 at

https://www.w3.org/TR/2014/WD-xpath-functions-31-20141007/#func-parse-json

used the options "unescape=true|false" with the default being true, and unescape=true corresponded roughly to the behavior of escape=false in the final specification. I suspect that there was an oversight when the option was renamed, with a failure to change the default value.

faassen commented 2 weeks ago

The situation with the tests is even stranger. fn-parse-json-061 and fn-parse-json-064 which both get an explicit map (but without explicit escape) do rely on escape to be true by default, while the tests I reported before that don't get an explicit map rely on escape being false by default.

michaelhkay commented 1 week ago

I think the examples given in the spec, especially:

The expression parse-json('{"x":"\\", "y":"\u0025"}') returns map{"x":"\","y":"%"}.

The expression parse-json('{"x":"\\", "y":"\u0025"}', map{'escape':true()}) returns map{"x":"\\","y":"%"}.

make it pretty clear that the intended default was escape=false, and that giving the default as true is a simple error. The tests are correct, the spec is wrong.

I don't think tests -061 and -064 are a problem. On the contrary, the specification says "It is an error to supply the fallback option if the escape option is present with the value true." so these examples reinforce the belief that the intended default is false.