Ordered comparisons (<, >) of arrays

glyn commented 2 years ago

Ordered comparisons of arrays are closely modelled on ordered comparisons of strings, but if either array has a descendent of an unordered type, the comparison yields false.

Added a note about slimming down the comparison examples table.

Fixes https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/244

Reviewers may find this rendered version useful.

glyn commented 2 years ago

Reviewers please note: I intend to make an improvement if/when this PR lands: see https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/249. Since the improvement may make the text longer, I'd like to work with the current text while we decide whether we want to support ordered comparisons of strings at all. If this PR is merged, I'd then like to fix https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/249 in a similar way for strings and arrays.

glyn commented 2 years ago

The reason for the restriction that array comparisons yield false if either array has a descendant which is an object, a boolean, or null is that without this restriction, transitivity is broken. For example, without this restriction [0, true] < 1 would yield true as would 1 < [2, 1]. If < was to remain transitive, then [0, true] < [2, 1] would yield false, but it would actually yield false because the comparison fails on the second element.

(I am thinking of adding a note to the text about this to clarify the rationale.)

goessner commented 2 years ago

Ok ... let's assume, we want to adopt lexicographic order [a,b] < [c,d] if a<c or a=c and b<d for arrays, analogous to strings, then few questions arise:

'ab' < 'abc' yields true, but ['a','b'] < ['a','b','c'] will yield false ...
how to deal with nested arrays and possible edge cases ?
why not use product order [a,b] < [c,d] if a<c and b<d instead ?

The benefit for users is questionable at best (I would like to get some use cases documented) ... while the effort for implementers (and spec writers :-) is significant.

So I propose ...

to allow general relations ==,!=,<,>,<=,>= for numbers and strings.
comparison of all other types support ==, != and yield false otherwise (mixed types!).

... as long as no strong demand from users and/or implementers is heard.

glyn commented 2 years ago

Ok ... let's assume, we want to adopt lexicographic order [a,b] < [c,d] if a<c or a=c and b<d for arrays, analogous to strings,

Note that the above is not the same spec that this PR provides. This PR is more like this:

for all values a, c and arrays b, d where a, b, c, and d and their descendants
consist only of numbers, strings, and arrays:
    [] < [a] ^ b
    [a] ^ b < [c] ^ d if a < c or (a == c and b < d)

where ^ denotes array concatenation.

(Perhaps this strengthen's @gregsdennis's point that the recursive definition is hard to read.)

then few questions arise:

* `'ab' < 'abc'` yields `true`, but `['a','b'] < ['a','b','c']` will yield `false` ...

Not so. ['a','b'] < ['a','b','c'] yields true in this PR.

* how to deal with nested arrays and possible edge cases ?

Nested arrays are straightforward provided they and their descendants consist only of numbers, strings, and arrays.

I think the main edge cases are where one or other value being compared is not, or has a decendent which is not, a number, string, or array. In all such cases < yields false.

* why not use [product order](https://en.wikipedia.org/wiki/Product_order)  `[a,b] < [c,d]  if  a<c and b<d` instead ?

Because that seems of (even) less practical use than lexicographic order.

The benefit for users is questionable at best (I would like to get some use cases documented) ... while the effort for implementers (and spec writers :-) is significant.

I agree about the lack of benefit for users. The main use case we've had so far is multi-string date ordering, although it has been pointed out that RFC 3339 strings provide a better approach for dates. I'm going to resist the temptation to try to dream up more use cases.

So I propose ...
1. to allow general relations `==,!=,<,>,<=,>=` for numbers and strings.

2. comparison of all other types support `==, !=` and yield `false` otherwise (mixed types!).
... as long as no strong demand from users and/or implementers is heard.

IIRC you are proposing the status quo. To implement your proposal we would simply need to close this PR unmerged. I'm happy to do that if there is a rough consensus in favour.

glyn commented 2 years ago

@cabo, @gregsdennis, @timbray, and others: would any of you care to defend ordered comparisons (<, >) of arrays? If not, we can safely close it unmerged.

timbray commented 2 years ago

Glyn, you have made a valiant effort here, and thanks for that, but yes, let's please go back to just != and == for structured values. In the future if someone asks why JSONpath does not include ordering, we now have a good answer as to why.

cabo commented 2 years ago

@cabo, @gregsdennis, @timbray, and others: would any of you care to defend ordered comparisons (<, >) of arrays? If not, we can safely close it unmerged.

I'm sorry, I haven't found anything that is problematic about ordering arrays yet. Obviously, it is easy to come up with unworkable ways of doing that. The trick is to start from a very small set of axioms.

As in (add axioms relating < and ==, same as in strings)

[] == []
[] < [x, ...y]
[x, ...y] <=> [x, ...z] ≝ [...y] <=> [...z]
[x, ...y] <=> [z, ...w] ≝ x <=> z if x ≠ z

That should do it.

cabo commented 2 years ago

What is the ^ that occurs in some of the comments above?

timbray commented 2 years ago

As in (add axioms relating < and ==, same as in strings)

[] == []

[] < [x]

[x, ...y] <=> [x, ...z] ≝ [...y] <=> [...z]

[x, ...y] <=> [z, ...w] ≝ x <=> z if x ≠ z

I don't understand that. Could it be expressed in human-readable language?

cabo commented 2 years ago

I don't understand that. Could it be expressed in human-readable language?

Yes. But pure English is a bad tool for discussing this. We can translate to English once we understand the structure that we are trying to achieve.

timbray commented 2 years ago

I don't understand that. Could it be expressed in human-readable language?

Yes. But pure English is a bad tool for discussing this. We can translate to English once we understand the structure that we are trying to achieve.

OK, but I still don't understand. I'm not being rhetorical here, I have a math degree (granted, a miserable B.Sc., decades old) and I don't understand your notation.

cabo commented 2 years ago

OK. The notation essentially tries to answer whether a <=> b, where a and b are JSON values. <=> is one of the comparison operators we want to define -- usually, they all work the same, so saying something about <=> just says its true for all of them. ...x and ...y pick up the rest of an array; I could have used [x | y] or some similar notation that splits an array into first and rest.

The point is to define comparison by looking at the first elements; if there is no first element on at least one side, the first two rules operate, if there is, the second two rules do. (Oh, and axioms such as mutual exclusion of a < b, a == b, a > b, and a < b ≝ b > a.)

glyn commented 2 years ago

What is the ^ that occurs in some of the comments above?

Oops, sorry, I meant to define that. It's array concatenation.

glyn commented 2 years ago

@cabo <=> in maths usually means "if and only if", which makes the axioms above quite hard to read. That's why I wrote down axioms just for < and left the rest not very far from the imagination.

But, anyway, the point here is not that ordering of arrays is particularly hard to specify, but that lexicographic ordering is somewhat arbitrary and not very useful in practice.

cabo commented 2 years ago

@cabo <=> in maths usually means "if and only if", which makes the axioms above quite hard to read. That's why I wrote down axioms just for < and left the rest not very far from the imagination.

OK, so maybe use some generic operator, such as ⊜. (<=> is used on some dynamic languages as the generic comparison operator, that's why I used it.)

But, anyway, the point here is not that ordering of arrays is particularly hard to specify, but that lexicographic ordering is somewhat arbitrary and not very useful in practice.

Well, it is analogous to strings, so I don't know that is particularly arbitrary. Whether it is useful depends on what you use arrays for; arrays of course can be used for a lot of applications only some of which will benefit from lexicographic ordering; but then that is still more than not having ordering at all.

glyn commented 2 years ago

But, anyway, the point here is not that ordering of arrays is particularly hard to specify, but that lexicographic ordering is somewhat arbitrary and not very useful in practice.

Well, it is analogous to strings, so I don't know that is particularly arbitrary. Whether it is useful depends on what you use arrays for; arrays of course can be used for a lot of applications only some of which will benefit from lexicographic ordering; but then that is still more than not having ordering at all.

I guess allowing strings and arrays to be members of the arrays we are trying to order actually limits the options quite a bit. For example, comparing numeric arrays by their length when considered to be a vector wouldn't really make sense for arrays of arrays. So perhaps lexicographic ordering isn't that arbitrary.

In terms of usefulness, I found @timbray's comment at the interim meeting persuasive: "if I saw lexicographic array comparison in a code review, I'd be concerned about it". I think the number of reasonable applications of lexicographic array ordering is likely very small.

Another concern I haven't mentioned yet is the "non-local effect" of objects, booleans, and null. This would require complete scanning of array arguments of ordered comparisons to make sure none of their descendants was of the wrong type. (Without this "non-local effect", we lose transitivity as I mentioned above.) Clearly this could be cached, but it does marginally complicate implementations.

Checking existing implementations, I've only found one (Goessner, and he's against this feature!) which behaves similarly to this PR. Similarly, JMESPath does not support ordered comparisons of arrays. So this PR would seem to be diverging from our charter given that we don't have a rough consensus that the approach is technically best.

Closing.

goessner commented 2 years ago

As in (add axioms relating < and ==, same as in strings)

* [] == []

* [] < [x, ...y]

* [x, ...y] <=> [x, ...z]  ≝  [...y] <=> [...z]

* [x, ...y] <=> [z, ...w]  ≝  x <=> z if x ≠ z

This is a very clean and complete definition, thanks.

With arrays we always have an implicit meaning regarding their elements. So for example

[1,2,3] ... spatial coordinates
[0,128,255] ... RGB values
[2022,08,22] ... date

only the third example qualifies to be ordered lexicographic. In the same way, as it might be coverted to a string and compared to another date. Noone would apply linear ordering to the first two "objects". So the JSON author defines by implicit meaning of her array elements, if ordering of arrays of that same type makes sense.

We as spec authors don't know about meaning of elements of an arbitrary array. We only offer lexicographic ordering in general for those, who might use it (hopefully those that don't, won't do that accidentially).

~~But what, if we now make the meaning explicit?~~

~~{"x":1,"y":2,"z":3} ... spatial coordinates~~
~~{"r":0,"g":128,"b":255} ... RGB values~~
~~{"y":2022,"m":08,"d":22} ... date~~

~~How could we argue, not offering lexicographic ordering for objects also? The same axioms apply ...~~

~~{} == {}~~
~~{} < {x, ...y}~~
~~{x, ...y} <=> {x, ...z} ≝ {...y} <=> {...z}~~
~~{x, ...y] <=> {z, ...w} ≝ x <=> z if x ≠ z~~

~~(Excuse for not using "" here)~~

~~"Objects themself are not ordered" is no valid argument, as we are able to judge (member-wise) equality of objects.~~

I cannot see a strong demand from implementers and/or users yet. If that would come surprisingly later we could add that ordering then. Maybe we could alternatively offer special extension points for that.

glyn commented 2 years ago

I cannot see a strong demand from implementers and/or users yet. If that would come surprisingly later we could add that ordering then.

I think that would be a backwards incompatible change, so it would be difficult to introduce. It would also impact interoperation.

Maybe we could alternatively offer special extension points for that.

If these ordering features could be handled by extension point, that would be great!

goessner commented 2 years ago

hmm ... my concept of ordering of different objects won't work ... exactly because of the argument: "Objects themself are not ordered".

glyn commented 2 years ago

A partial ordering of objects could be defined in terms of the subset relationship when objects are considered to be sets of members. But, again, this is fairly arbitrary from an application POV.

goessner commented 2 years ago

Checking existing implementations, I've only found one (Goessner, and he's against this feature!)

:-)

In fact testing in JavaScript (node.js) surprisingly shows somehow lexicographic ordering of arrays. But this is not (sufficiently) documented at least at MSN.

goessner commented 2 years ago

I think that would be a backwards incompatible change, so it would be difficult to introduce. It would also impact interoperation.

I only was thinking: "later" this year ...

cabo commented 2 years ago

So let me rewrite this:

As in (add axioms relating < and ==, same as in strings)

[] == []

[] < [x | y]

[x | y] ⊜ [x | z] ≝ y ⊜ z

[x | y] ⊜ [z | w] ≝ x ⊜ z if x ≠ z

[ A | B ] splits an array into first (A) and rest (B), so [1, 2, 3] becomes A = 1 and B = [2, 3]

This really is best thought of using the spaceship operator (<=>) that returns -1, 0, or 1 for a comparison (less, equal, greater); we'd need to augment this with fail (absent).

a == b ≝ (a <=> b) == 0 a != b ≝ (a <=> b) != 0 a <= b ≝ (a <=> b) <= 0 a >= b ≝ (a <=> b) >= 0 a < b ≝ (a <=> b) < 0 a > b ≝ (a <=> b) > 0

(Sorry about not reading <=> as ⇔ -- just as we don't read <= as ⇐ in the usual programming languages, which misreading actually caused some languages to write <= as =<.)

You can derive tautologies from the axioms like

[a] ⊜ [b] ≣ a ⊜ b

without which the principle of least surprise is heavily violated.

(This also speaks against actively searching out for non-comparables before applying these axioms, but this is a separate consideration.)

timbray commented 2 years ago

OK, I get the formalism. I just think it's wrong.

I'm not comfortable asserting that

[2, "fish", false] < [3, {"val": [11,22]}, null, null, null, ["x"]]

They are just not comparable in any meaningful sense.

Hmm... JSON arrays are really tuples I think.

glyn commented 2 years ago

It was that kind of example which motivated the approach in this PR: to force < and > to yield false if there is an object, boolean, or null among the descendants of either operand. This also helped to preserve transitivity (e.g. a < b and b < c implies a < c).

ietf-wg-jsonpath / draft-ietf-jsonpath-base

Ordered comparisons (<, >) of arrays #248