Closed glyn closed 2 years ago
Reviewers please note: I intend to make an improvement if/when this PR lands: see https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/249. Since the improvement may make the text longer, I'd like to work with the current text while we decide whether we want to support ordered comparisons of strings at all. If this PR is merged, I'd then like to fix https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/249 in a similar way for strings and arrays.
The reason for the restriction that array comparisons yield false if either array has a descendant which is an object, a boolean, or null
is that without this restriction, transitivity is broken. For example, without this restriction [0, true] < 1
would yield true as would 1 < [2, 1]
. If <
was to remain transitive, then [0, true] < [2, 1]
would yield false, but it would actually yield false because the comparison fails on the second element.
(I am thinking of adding a note to the text about this to clarify the rationale.)
Ok ... let's assume, we want to adopt lexicographic order [a,b] < [c,d] if a<c or a=c and b<d
for arrays, analogous to strings, then few questions arise:
'ab' < 'abc'
yields true
, but ['a','b'] < ['a','b','c']
will yield false
... [a,b] < [c,d] if a<c and b<d
instead ?The benefit for users is questionable at best (I would like to get some use cases documented) ... while the effort for implementers (and spec writers :-) is significant.
So I propose ...
==,!=,<,>,<=,>=
for numbers and strings.==, !=
and yield false
otherwise (mixed types!).... as long as no strong demand from users and/or implementers is heard.
Ok ... let's assume, we want to adopt lexicographic order
[a,b] < [c,d] if a<c or a=c and b<d
for arrays, analogous to strings,
Note that the above is not the same spec that this PR provides. This PR is more like this:
for all values a, c and arrays b, d where a, b, c, and d and their descendants
consist only of numbers, strings, and arrays:
[] < [a] ^ b
[a] ^ b < [c] ^ d if a < c or (a == c and b < d)
where ^
denotes array concatenation.
(Perhaps this strengthen's @gregsdennis's point that the recursive definition is hard to read.)
then few questions arise:
* `'ab' < 'abc'` yields `true`, but `['a','b'] < ['a','b','c']` will yield `false` ...
Not so. ['a','b'] < ['a','b','c']
yields true in this PR.
* how to deal with nested arrays and possible edge cases ?
Nested arrays are straightforward provided they and their descendants consist only of numbers, strings, and arrays.
I think the main edge cases are where one or other value being compared is not, or has a decendent which is not, a number, string, or array. In all such cases <
yields false.
* why not use [product order](https://en.wikipedia.org/wiki/Product_order) `[a,b] < [c,d] if a<c and b<d` instead ?
Because that seems of (even) less practical use than lexicographic order.
The benefit for users is questionable at best (I would like to get some use cases documented) ... while the effort for implementers (and spec writers :-) is significant.
I agree about the lack of benefit for users. The main use case we've had so far is multi-string date ordering, although it has been pointed out that RFC 3339 strings provide a better approach for dates. I'm going to resist the temptation to try to dream up more use cases.
So I propose ...
1. to allow general relations `==,!=,<,>,<=,>=` for numbers and strings. 2. comparison of all other types support `==, !=` and yield `false` otherwise (mixed types!).
... as long as no strong demand from users and/or implementers is heard.
IIRC you are proposing the status quo. To implement your proposal we would simply need to close this PR unmerged. I'm happy to do that if there is a rough consensus in favour.
@cabo, @gregsdennis, @timbray, and others: would any of you care to defend ordered comparisons (<, >) of arrays? If not, we can safely close it unmerged.
Glyn, you have made a valiant effort here, and thanks for that, but yes, let's please go back to just != and == for structured values. In the future if someone asks why JSONpath does not include ordering, we now have a good answer as to why.
@cabo, @gregsdennis, @timbray, and others: would any of you care to defend ordered comparisons (<, >) of arrays? If not, we can safely close it unmerged.
I'm sorry, I haven't found anything that is problematic about ordering arrays yet. Obviously, it is easy to come up with unworkable ways of doing that. The trick is to start from a very small set of axioms.
As in (add axioms relating < and ==, same as in strings)
That should do it.
What is the ^
that occurs in some of the comments above?
As in (add axioms relating < and ==, same as in strings)
- [] == []
- [] < [x]
- [x, ...y] <=> [x, ...z] ≝ [...y] <=> [...z]
- [x, ...y] <=> [z, ...w] ≝ x <=> z if x ≠ z
I don't understand that. Could it be expressed in human-readable language?
I don't understand that. Could it be expressed in human-readable language?
Yes. But pure English is a bad tool for discussing this. We can translate to English once we understand the structure that we are trying to achieve.
I don't understand that. Could it be expressed in human-readable language?
Yes. But pure English is a bad tool for discussing this. We can translate to English once we understand the structure that we are trying to achieve.
OK, but I still don't understand. I'm not being rhetorical here, I have a math degree (granted, a miserable B.Sc., decades old) and I don't understand your notation.
OK. The notation essentially tries to answer whether a <=> b, where a and b are JSON values. <=> is one of the comparison operators we want to define -- usually, they all work the same, so saying something about <=> just says its true for all of them. ...x and ...y pick up the rest of an array; I could have used [x | y] or some similar notation that splits an array into first and rest.
The point is to define comparison by looking at the first elements; if there is no first element on at least one side, the first two rules operate, if there is, the second two rules do. (Oh, and axioms such as mutual exclusion of a < b, a == b, a > b, and a < b ≝ b > a.)
What is the
^
that occurs in some of the comments above?
Oops, sorry, I meant to define that. It's array concatenation.
@cabo <=>
in maths usually means "if and only if", which makes the axioms above quite hard to read. That's why I wrote down axioms just for <
and left the rest not very far from the imagination.
But, anyway, the point here is not that ordering of arrays is particularly hard to specify, but that lexicographic ordering is somewhat arbitrary and not very useful in practice.
@cabo
<=>
in maths usually means "if and only if", which makes the axioms above quite hard to read. That's why I wrote down axioms just for<
and left the rest not very far from the imagination.
OK, so maybe use some generic operator, such as ⊜. (<=> is used on some dynamic languages as the generic comparison operator, that's why I used it.)
But, anyway, the point here is not that ordering of arrays is particularly hard to specify, but that lexicographic ordering is somewhat arbitrary and not very useful in practice.
Well, it is analogous to strings, so I don't know that is particularly arbitrary. Whether it is useful depends on what you use arrays for; arrays of course can be used for a lot of applications only some of which will benefit from lexicographic ordering; but then that is still more than not having ordering at all.
But, anyway, the point here is not that ordering of arrays is particularly hard to specify, but that lexicographic ordering is somewhat arbitrary and not very useful in practice.
Well, it is analogous to strings, so I don't know that is particularly arbitrary. Whether it is useful depends on what you use arrays for; arrays of course can be used for a lot of applications only some of which will benefit from lexicographic ordering; but then that is still more than not having ordering at all.
I guess allowing strings and arrays to be members of the arrays we are trying to order actually limits the options quite a bit. For example, comparing numeric arrays by their length when considered to be a vector wouldn't really make sense for arrays of arrays. So perhaps lexicographic ordering isn't that arbitrary.
In terms of usefulness, I found @timbray's comment at the interim meeting persuasive: "if I saw lexicographic array comparison in a code review, I'd be concerned about it". I think the number of reasonable applications of lexicographic array ordering is likely very small.
Another concern I haven't mentioned yet is the "non-local effect" of objects, booleans, and null
. This would require complete scanning of array arguments of ordered comparisons to make sure none of their descendants was of the wrong type. (Without this "non-local effect", we lose transitivity as I mentioned above.) Clearly this could be cached, but it does marginally complicate implementations.
Checking existing implementations, I've only found one (Goessner, and he's against this feature!) which behaves similarly to this PR. Similarly, JMESPath does not support ordered comparisons of arrays. So this PR would seem to be diverging from our charter given that we don't have a rough consensus that the approach is technically best.
Closing.
As in (add axioms relating < and ==, same as in strings)
* [] == [] * [] < [x, ...y] * [x, ...y] <=> [x, ...z] ≝ [...y] <=> [...z] * [x, ...y] <=> [z, ...w] ≝ x <=> z if x ≠ z
This is a very clean and complete definition, thanks.
With arrays we always have an implicit meaning regarding their elements. So for example
[1,2,3]
... spatial coordinates[0,128,255]
... RGB values[2022,08,22]
... dateonly the third example qualifies to be ordered lexicographic. In the same way, as it might be coverted to a string and compared to another date. Noone would apply linear ordering to the first two "objects". So the JSON author defines by implicit meaning of her array elements, if ordering of arrays of that same type makes sense.
We as spec authors don't know about meaning of elements of an arbitrary array. We only offer lexicographic ordering in general for those, who might use it (hopefully those that don't, won't do that accidentially).
But what, if we now make the meaning explicit?
{"x":1,"y":2,"z":3}
... spatial coordinates{"r":0,"g":128,"b":255}
... RGB values{"y":2022,"m":08,"d":22}
... dateHow could we argue, not offering lexicographic ordering for objects also? The same axioms apply ...
{} == {}
{} < {x, ...y}
{x, ...y} <=> {x, ...z} ≝ {...y} <=> {...z}
{x, ...y] <=> {z, ...w} ≝ x <=> z if x ≠ z
(Excuse for not using ""
here)
"Objects themself are not ordered" is no valid argument, as we are able to judge (member-wise) equality of objects.
I cannot see a strong demand from implementers and/or users yet. If that would come surprisingly later we could add that ordering then. Maybe we could alternatively offer special extension points for that.
I cannot see a strong demand from implementers and/or users yet. If that would come surprisingly later we could add that ordering then.
I think that would be a backwards incompatible change, so it would be difficult to introduce. It would also impact interoperation.
Maybe we could alternatively offer special extension points for that.
If these ordering features could be handled by extension point, that would be great!
hmm ... my concept of ordering of different objects won't work ... exactly because of the argument: "Objects themself are not ordered".
A partial ordering of objects could be defined in terms of the subset relationship when objects are considered to be sets of members. But, again, this is fairly arbitrary from an application POV.
I think that would be a backwards incompatible change, so it would be difficult to introduce. It would also impact interoperation.
I only was thinking: "later" this year ...
So let me rewrite this:
As in (add axioms relating < and ==, same as in strings)
- [] == []
- [] < [x | y]
- [x | y] ⊜ [x | z] ≝ y ⊜ z
- [x | y] ⊜ [z | w] ≝ x ⊜ z if x ≠ z
[ A | B ] splits an array into first (A) and rest (B), so [1, 2, 3] becomes A = 1 and B = [2, 3]
This really is best thought of using the spaceship operator (<=>) that returns -1, 0, or 1 for a comparison (less, equal, greater); we'd need to augment this with fail (absent).
a == b ≝ (a <=> b) == 0 a != b ≝ (a <=> b) != 0 a <= b ≝ (a <=> b) <= 0 a >= b ≝ (a <=> b) >= 0 a < b ≝ (a <=> b) < 0 a > b ≝ (a <=> b) > 0
(Sorry about not reading <=>
as ⇔ -- just as we don't read <=
as ⇐ in the usual programming languages, which misreading actually caused some languages to write <=
as =<
.)
You can derive tautologies from the axioms like
[a] ⊜ [b] ≣ a ⊜ b
without which the principle of least surprise is heavily violated.
(This also speaks against actively searching out for non-comparables before applying these axioms, but this is a separate consideration.)
OK, I get the formalism. I just think it's wrong.
I'm not comfortable asserting that
[2, "fish", false] < [3, {"val": [11,22]}, null, null, null, ["x"]]
They are just not comparable in any meaningful sense.
Hmm... JSON arrays are really tuples I think.
It was that kind of example which motivated the approach in this PR: to force <
and >
to yield false if there is an object, boolean, or null
among the descendants of either operand. This also helped to preserve transitivity (e.g. a < b
and b < c
implies a < c
).
Ordered comparisons of arrays are closely modelled on ordered comparisons of strings, but if either array has a descendent of an unordered type, the comparison yields false.
Added a note about slimming down the comparison examples table.
Fixes https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/244
Reviewers may find this rendered version useful.