ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
58 stars 20 forks source link

Terminology: Unicode Scalar Values #473

Closed sayrer closed 1 year ago

sayrer commented 1 year ago

The last sentence of this definition says: "Both JSON string values and JSONPath queries are sequences of Unicode scalar values."

Despite what we may wish, this is not true. JSON string values are only code points. ECMA-404 says: "JSON syntax describes a sequence of Unicode code points." RFC8259 also admits this can be the case, but it comes with a warning about interoperability. Section 1.3 handles it well enough, so I think the last sentence in the Unicode Scalar definition can be removed.

"Also, selecting a child by name (Section 2.3.1) and comparing strings (Section 2.3.5.2.2 in Section 2.3.5) assume these strings are sequences of Unicode scalar values, becoming unpredictable if they are not (Section 8.2 of [RFC8259])."

cabo commented 1 year ago

Hi Rob,

I'm wondering why you are saying this. Do you want to turn back the clock? JSONPath is defined for sequences of Unicode scalar values. We might as well say this outright, as opposed to letting readers infer that.

sayrer commented 1 year ago

Oh, sorry, I meant "JSON string values". JSONPath can of course be defined as Unicode scalar values. So maybe "JSONPath queries are sequences of Unicode scalar values." could be the last sentence in this definition. The reference to (Section 8.2 of [RFC8259]) would not be needed if this were true of JSON string values. The reasoning in that RFC treats this as a bug, but it is usually done on purpose to support software that can't be changed (JavaScript strings, Windows path names, etc). The Motivation section of WTF-8 is a pretty accurate summary imho.

I do support requiring well-formed Unicode for new specs, so making this requirement for JSONPath queries is fine by me.

sayrer commented 1 year ago

Maybe the easiest way to explain this issue is by citing Section 1.2 of RFC8259. ECMA-404 is a normative reference there, so this specification must live with whatever it allows, even if it is distasteful.

timbray commented 1 year ago

Maybe the easiest way to explain this issue is by citing Section 1.2 of RFC8259. ECMA-404 is a normative reference there, so this specification must live with whatever it allows, even if it is distasteful.

There is carefully crafted language around that citation which states that there are no differences in JSON as specified by ECMA-404 and 8259, noting that 404 allows things that 8259 cautions against. I don't see how such a citation would help.

sayrer commented 1 year ago

Right, I'm not suggesting such a reference (I meant to use "citing" just for my comment, not the draft). The JSONPath draft can't say that "JSON string values" are "sequences of Unicode scalar values." because that isn't true. I'm only suggesting that part should go.

This draft normatively references 8259, which normatively references ECMA-404. So, it can't redefine JSON string values and end up with something coherent.

I might also say that 404 has things that 8259 cautions against, but 8259 nevertheless allows.