Closed glyn closed 2 years ago
Co-chair-hat-on: This does not line up with my perception of where the WG consensus was, which was that all == comparisons between structured values were false.
You're right about WG concensus. The purpose of this PR and issue https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/236 is to change the WG consensus.
Co-chair-hat-off: If the rest of the WG is OK with this I can live with it, but my preference would be to stay with the simpler rule.
Although the "always false" rule is shorter, it's not necessarily simpler as it is a special case, and a counter-intuitive one at that. I think it makes JSONPath harder to understand, by both implementers and users.
RFC 8259 says the names in an object should be unique (but need not be), and ECMA-404 says they are not required to be unique. Should the definition of object equality be tightened allowing for the possibility of non-unique names? That is, should {"a": 1, "a": 1, "b": 2}
and {"a": 1, "b": 2}
be considered a possibility and regarded as non-equal? Or does the draft already preclude non-unique names? (@glyn's proposed test already works if the values associated with non-unique names are different, perhaps it's irrelevant if there are additional duplicate members in one object that are exactly the same. Or there could be a requirement that the number of members in both objects must be the same.)
RFC 8259 says the names in an object should be unique (but need not be), and ECMA-404 says they are not required to be unique. Should the definition of object equality be tightened allowing for the possibility of non-unique names? That is, should
{"a": 1, "a": 1, "b": 2}
and{"a": 1, "b": 2}
be considered a possibility and regarded as non-equal? Or does the draft already preclude non-unique names? (@glyn's proposed test already works if the values associated with non-unique names are different, perhaps it's irrelevant if there are additional duplicate members in one object that are exactly the same. Or there could be a requirement that the number of members in both objects must be the same.)
This PR already copes with objects with non-unique names and it treats {"a": 1, "a": 1, "b": 2}
and {"a": 1, "b": 2}
as equal, which I think is ok since an object represents a function from name to value. Non-unique names with distinct values are more problematic since such an object would not represent a function.
We could either stick with the current "tight" proposal or weaken it to say that '==' and '!=' are undefined (in other words can be true or false, depending on the implementation) if two objects are being compared, one or both of which have non-unique names.
RFC 8259 says the names in an object should be unique (but need not be), and ECMA-404 says they are not required to be unique. Should the definition of object equality be tightened allowing for the possibility of non-unique names? That is, should
{"a": 1, "a": 1, "b": 2}
and{"a": 1, "b": 2}
be considered a possibility and regarded as non-equal? Or does the draft already preclude non-unique names? (@glyn's proposed test already works if the values associated with non-unique names are different, perhaps it's irrelevant if there are additional duplicate members in one object that are exactly the same. Or there could be a requirement that the number of members in both objects must be the same.)This PR already copes with objects with non-unique names and it treats
{"a": 1, "a": 1, "b": 2}
and{"a": 1, "b": 2}
as equal, which I think is ok since an object represents a function from name to value. Non-unique names with distinct values are more problematic since such an object would not represent a function.We could either stick with the current "tight" proposal or weaken it to say that '==' and '!=' are undefined (in other words can be true or false, depending on the implementation) if two objects are being compared, one or both of which have non-unique names.
I'm going to merge and defer this consideration.
For the record, upon further consideration, I am increasingly uncomfortable with this because implementations are required to use recursion bounded only by the size of the document, and I'm dubious about the actual value in production.
Let's get this on the agenda for our interim next week.
For the record, upon further consideration, I am increasingly uncomfortable with this because implementations are required to use recursion bounded only by the size of the document, and I'm dubious about the actual value in production.
Let's get this on the agenda for our interim next week.
I'm happy to discuss this next week, but please note that:
The amount of memory in a data structure could depend on the nesting depth of the structured types in question, but it would be a small fraction of the memory consumed by the parsed JSON document, so I don't think it counts as a separate attack vector.
Now being back again from a longer absent, I am quite impressed, what you (all) have done so far ... thanks.
I cannot find the issue, where I already proposed support of (in)equality of structured values (it might have been in the context of in
operator, before we skipped it).
So from my point of view introducing this feature brings with it a higher level of consistency. In the same way I welcome the decision to support lexical ordering of strings for comparison.
For testing equality of arrays and objects element- and memberwise comparison is the way to go. There is no reputable ordering of two different arrays based on comparing their elements, at least not in mathematics, where we use the Euclidean Norm of vectors and matrices for that.
By substituting
a <= b
as a == b || a < b
a >= b
as a == b || a > b
it seems to be legitimate to consider them true
if a == b
, even when/if a
and b
are not ordered.
For comparing objects with non-unique names we might pragmatically reuse the JavaScript way, where the latter member overwrites the former (seen from JSON text).
This PR is merged, so this thread might go unnoticed, but let me respond...
Now being back again from a longer absent, I am quite impressed, what you (all) have done so far ... thanks.
I cannot find the issue, where I already proposed support of (in)equality of structured values (it might have been in the context of
in
operator, before we skipped it).So from my point of view introducing this feature brings with it a higher level of consistency. In the same way I welcome the decision to support lexical ordering of strings for comparison.
For testing equality of arrays and objects element- and memberwise comparison is the way to go. There is no reputable ordering of two different arrays based on comparing their elements, at least not in mathematics, where we use the Euclidean Norm of vectors and matrices for that.
I think a straightforward ordering of arrays would be lexicographic with a similar spec to that for string ordering.
As for ordering objects, we could say that A <= B if all the names of A are names of B and for each name, the value in A is less than or equal to the value in B. Not sure about duplicate names - I'd probably make that case undefined to give maximum freedom to implementations and keep the interop rules consistent with I-JSON.
By substituting
* `a <= b` as `a == b || a < b` * `a >= b` as `a == b || a > b`
it seems to be legitimate to consider them
true
ifa == b
, even when/ifa
andb
are not ordered.
I tend to agree.
For comparing objects with non-unique names we might pragmatically reuse the JavaScript way, where the latter member overwrites the former (seen from JSON text).
I'd prefer to avoid semantics based on the order of the JSON text. I'd prefer to go the I-JSON route and make these comparisons unspecified/undefined (i.e. it's ok for implementations to yield true or false).
I also tend to agree with most of your points. I need to read I-JSON again more carefully.
On 9. Aug 2022, at 22:48, Daniel Parker @.***> wrote:
RFC 8259 says the names in an object should be unique (but need not be), and ECMA-404 says they are not required to be unique. Should the definition of object equality be tightened allowing for the possibility of non-unique names?
No.
ECMA-404, and, to a lesser extent, RFC 8259 describe the format on the wire (JSON texts), which is where they couldn’t (or didn’t want to) summon the energy to exclude members with identical keys. JSONPath operates on JSON values. The text in Section 4 of RFC 8259 is a reflection of the unsavory political issues that prevented this to be clearly defined, it does mention a “name-value mapping”, which is an indication that a map structure was intended; it also reflects the origins of JSON objects as JavaScript objects. JSON texts that violate the interoperability constraints generally turn into true maps during decoding if not causing an error, just not in a way that is predictable for the originator. There is no way JSONPath could react to the deviations in the JSON text, as it only sees the JSON value after decoding.
It is very sad that this aspect needs to be rediscussed every time anything uses JSON.
Grüße, Carsten
I agree it's sad. Would it be valid for a JSON implementation to implement an object such that a non-unique name is mapped randomly to one of its values? Unless, we can rule out such an implementation, I think we have to accommodate it in JSONPath. I wonder if it's best to have a general statement that the behaviour of JSONPath for objects with non-unique names is undefined. We'd then have to avoid contradicting that general statement by overspecifying the behaviour of object comparisons etc.
interesting ... and yes, it's sad.
A JavaScript implementation of JSONPath won't see non-unique member names after parsing JSON text. I don't know how implementations in other languages behave. But nevertheless we need to treat the JSONPath spec as language/implementation agnostic and cannot reliably predict/assume anything ... see RFC 8259 section 4.
Maybe it's best to use a similar wording as in RFC 8259, such as:
"When the names within an object are not unique, the behavior of an implementation comparing these name/value pairs is unpredictable."
Well, the language in 8259 has not attracted any negative comments and people seem to understand what it means, so I suggest re-using it.
I guess I'm concerned that a reasonable implementation may be completely predictable. For example, comparing such objects could always yield false.
"the behavior of an implementation comparing these name/value pairs is unpredictable" seems to require all implementations to be unpredictable!
Well, the language in 8259 has not attracted any negative comments and people seem to understand what it means, so I suggest re-using it.
Well, the language in 8259 is about JSON texts. We are talking about JSON values.
> JSON.parse("{\"a\": 1, \"a\": 2}")
< {a: 2}
I guess I'm concerned that a reasonable implementation may be completely predictable. For example, comparing such objects could always yield false.
"the behavior of an implementation comparing these name/value pairs is unpredictable" seems to require all implementations to be unpredictable!
I think the intention is to say that the originator cannot predict what the consumer will do, hence "unpredictable". (The C language defines "unpredictable behavior" as "might delete all your files", which is a bit outside what we mean here.)
hmm ... implementations are (should be) predictable, but different implementations treating ill-formed JSON cannot yield predictably identical results. Might be referencing I-JSON here of any help ?
Ok, so the spec writer cannot predict what implementations will do. I think that wording is sloppy. I'd much rather say something like "the behaviour is implementation dependent".
Fixes https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/236
Reviewers may find this rendered version useful.