Equality and inequality of structured values

glyn commented 2 years ago

Fixes https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/236

Reviewers may find this rendered version useful.

glyn commented 2 years ago

Co-chair-hat-on: This does not line up with my perception of where the WG consensus was, which was that all == comparisons between structured values were false.

You're right about WG concensus. The purpose of this PR and issue https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/236 is to change the WG consensus.

Co-chair-hat-off: If the rest of the WG is OK with this I can live with it, but my preference would be to stay with the simpler rule.

Although the "always false" rule is shorter, it's not necessarily simpler as it is a special case, and a counter-intuitive one at that. I think it makes JSONPath harder to understand, by both implementers and users.

danielaparker commented 2 years ago

RFC 8259 says the names in an object should be unique (but need not be), and ECMA-404 says they are not required to be unique. Should the definition of object equality be tightened allowing for the possibility of non-unique names? That is, should {"a": 1, "a": 1, "b": 2} and {"a": 1, "b": 2} be considered a possibility and regarded as non-equal? Or does the draft already preclude non-unique names? (@glyn's proposed test already works if the values associated with non-unique names are different, perhaps it's irrelevant if there are additional duplicate members in one object that are exactly the same. Or there could be a requirement that the number of members in both objects must be the same.)

glyn commented 2 years ago

RFC 8259 says the names in an object should be unique (but need not be), and ECMA-404 says they are not required to be unique. Should the definition of object equality be tightened allowing for the possibility of non-unique names? That is, should {"a": 1, "a": 1, "b": 2} and {"a": 1, "b": 2} be considered a possibility and regarded as non-equal? Or does the draft already preclude non-unique names? (@glyn's proposed test already works if the values associated with non-unique names are different, perhaps it's irrelevant if there are additional duplicate members in one object that are exactly the same. Or there could be a requirement that the number of members in both objects must be the same.)

This PR already copes with objects with non-unique names and it treats {"a": 1, "a": 1, "b": 2} and {"a": 1, "b": 2} as equal, which I think is ok since an object represents a function from name to value. Non-unique names with distinct values are more problematic since such an object would not represent a function.

We could either stick with the current "tight" proposal or weaken it to say that '==' and '!=' are undefined (in other words can be true or false, depending on the implementation) if two objects are being compared, one or both of which have non-unique names.

glyn commented 2 years ago

RFC 8259 says the names in an object should be unique (but need not be), and ECMA-404 says they are not required to be unique. Should the definition of object equality be tightened allowing for the possibility of non-unique names? That is, should {"a": 1, "a": 1, "b": 2} and {"a": 1, "b": 2} be considered a possibility and regarded as non-equal? Or does the draft already preclude non-unique names? (@glyn's proposed test already works if the values associated with non-unique names are different, perhaps it's irrelevant if there are additional duplicate members in one object that are exactly the same. Or there could be a requirement that the number of members in both objects must be the same.)

This PR already copes with objects with non-unique names and it treats {"a": 1, "a": 1, "b": 2} and {"a": 1, "b": 2} as equal, which I think is ok since an object represents a function from name to value. Non-unique names with distinct values are more problematic since such an object would not represent a function.

We could either stick with the current "tight" proposal or weaken it to say that '==' and '!=' are undefined (in other words can be true or false, depending on the implementation) if two objects are being compared, one or both of which have non-unique names.

I'm going to merge and defer this consideration.

timbray commented 2 years ago

For the record, upon further consideration, I am increasingly uncomfortable with this because implementations are required to use recursion bounded only by the size of the document, and I'm dubious about the actual value in production.

Let's get this on the agenda for our interim next week.

glyn commented 2 years ago

For the record, upon further consideration, I am increasingly uncomfortable with this because implementations are required to use recursion bounded only by the size of the document, and I'm dubious about the actual value in production.

Let's get this on the agenda for our interim next week.

I'm happy to discuss this next week, but please note that:

implementations are not required to use recursion (e.g. they could use a loop and allocate memory in a data structure),
the descendant operators might also use recursion bounded only by the size of the document (or use a loop and allocate memory in a data structure).

The amount of memory in a data structure could depend on the nesting depth of the structured types in question, but it would be a small fraction of the memory consumed by the parsed JSON document, so I don't think it counts as a separate attack vector.

goessner commented 2 years ago

Now being back again from a longer absent, I am quite impressed, what you (all) have done so far ... thanks.

I cannot find the issue, where I already proposed support of (in)equality of structured values (it might have been in the context of in operator, before we skipped it).

So from my point of view introducing this feature brings with it a higher level of consistency. In the same way I welcome the decision to support lexical ordering of strings for comparison.

For testing equality of arrays and objects element- and memberwise comparison is the way to go. There is no reputable ordering of two different arrays based on comparing their elements, at least not in mathematics, where we use the Euclidean Norm of vectors and matrices for that.

By substituting

a <= b as a == b || a < b
a >= b as a == b || a > b

it seems to be legitimate to consider them true if a == b, even when/if a and b are not ordered.

For comparing objects with non-unique names we might pragmatically reuse the JavaScript way, where the latter member overwrites the former (seen from JSON text).

glyn commented 2 years ago

This PR is merged, so this thread might go unnoticed, but let me respond...

Now being back again from a longer absent, I am quite impressed, what you (all) have done so far ... thanks.

I cannot find the issue, where I already proposed support of (in)equality of structured values (it might have been in the context of in operator, before we skipped it).

So from my point of view introducing this feature brings with it a higher level of consistency. In the same way I welcome the decision to support lexical ordering of strings for comparison.

For testing equality of arrays and objects element- and memberwise comparison is the way to go. There is no reputable ordering of two different arrays based on comparing their elements, at least not in mathematics, where we use the Euclidean Norm of vectors and matrices for that.

I think a straightforward ordering of arrays would be lexicographic with a similar spec to that for string ordering.

As for ordering objects, we could say that A <= B if all the names of A are names of B and for each name, the value in A is less than or equal to the value in B. Not sure about duplicate names - I'd probably make that case undefined to give maximum freedom to implementations and keep the interop rules consistent with I-JSON.

By substituting
* `a <= b`  as `a == b || a < b`

* `a >= b`  as `a == b || a > b`
it seems to be legitimate to consider them true if a == b, even when/if a and b are not ordered.

I tend to agree.

For comparing objects with non-unique names we might pragmatically reuse the JavaScript way, where the latter member overwrites the former (seen from JSON text).

I'd prefer to avoid semantics based on the order of the JSON text. I'd prefer to go the I-JSON route and make these comparisons unspecified/undefined (i.e. it's ok for implementations to yield true or false).

goessner commented 2 years ago

I also tend to agree with most of your points. I need to read I-JSON again more carefully.

cabo commented 2 years ago

On 9. Aug 2022, at 22:48, Daniel Parker @.***> wrote:

RFC 8259 says the names in an object should be unique (but need not be), and ECMA-404 says they are not required to be unique. Should the definition of object equality be tightened allowing for the possibility of non-unique names?

No.

ECMA-404, and, to a lesser extent, RFC 8259 describe the format on the wire (JSON texts), which is where they couldn’t (or didn’t want to) summon the energy to exclude members with identical keys. JSONPath operates on JSON values. The text in Section 4 of RFC 8259 is a reflection of the unsavory political issues that prevented this to be clearly defined, it does mention a “name-value mapping”, which is an indication that a map structure was intended; it also reflects the origins of JSON objects as JavaScript objects. JSON texts that violate the interoperability constraints generally turn into true maps during decoding if not causing an error, just not in a way that is predictable for the originator. There is no way JSONPath could react to the deviations in the JSON text, as it only sees the JSON value after decoding.

It is very sad that this aspect needs to be rediscussed every time anything uses JSON.

Grüße, Carsten

glyn commented 2 years ago

I agree it's sad. Would it be valid for a JSON implementation to implement an object such that a non-unique name is mapped randomly to one of its values? Unless, we can rule out such an implementation, I think we have to accommodate it in JSONPath. I wonder if it's best to have a general statement that the behaviour of JSONPath for objects with non-unique names is undefined. We'd then have to avoid contradicting that general statement by overspecifying the behaviour of object comparisons etc.

goessner commented 2 years ago

interesting ... and yes, it's sad.

A JavaScript implementation of JSONPath won't see non-unique member names after parsing JSON text. I don't know how implementations in other languages behave. But nevertheless we need to treat the JSONPath spec as language/implementation agnostic and cannot reliably predict/assume anything ... see RFC 8259 section 4.

goessner commented 2 years ago

Maybe it's best to use a similar wording as in RFC 8259, such as:

"When the names within an object are not unique, the behavior of an implementation comparing these name/value pairs is unpredictable."

timbray commented 2 years ago

Well, the language in 8259 has not attracted any negative comments and people seem to understand what it means, so I suggest re-using it.

glyn commented 2 years ago

I guess I'm concerned that a reasonable implementation may be completely predictable. For example, comparing such objects could always yield false.

"the behavior of an implementation comparing these name/value pairs is unpredictable" seems to require all implementations to be unpredictable!

cabo commented 2 years ago

Well, the language in 8259 has not attracted any negative comments and people seem to understand what it means, so I suggest re-using it.

Well, the language in 8259 is about JSON texts. We are talking about JSON values.

> JSON.parse("{\"a\": 1, \"a\": 2}")
< {a: 2}

cabo commented 2 years ago

I guess I'm concerned that a reasonable implementation may be completely predictable. For example, comparing such objects could always yield false.

"the behavior of an implementation comparing these name/value pairs is unpredictable" seems to require all implementations to be unpredictable!

I think the intention is to say that the originator cannot predict what the consumer will do, hence "unpredictable". (The C language defines "unpredictable behavior" as "might delete all your files", which is a bit outside what we mean here.)

goessner commented 2 years ago

hmm ... implementations are (should be) predictable, but different implementations treating ill-formed JSON cannot yield predictably identical results. Might be referencing I-JSON here of any help ?

glyn commented 2 years ago

Ok, so the spec writer cannot predict what implementations will do. I think that wording is sloppy. I'd much rather say something like "the behaviour is implementation dependent".

ietf-wg-jsonpath / draft-ietf-jsonpath-base

Equality and inequality of structured values #237