ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
59 stars 20 forks source link

Semantics of the list form of descendant selector #232

Closed glyn closed 2 years ago

glyn commented 2 years ago

The semantics (as of 2022-08-06) for the descendant selector involving a list:

Which descendants are selected

It is currently implicit that the list selector should select descendants based on the list semantics. This should be made explicit.

Ordering vs implementation consensus

An example from the JSONPath comparison project applies the path $..['c', 'd'] to the value:

[
 {"c":"cc1","d":"dd1","e":"ee1"},
 {"c": "cc2", "child": {"d": "dd2"}},
 {"c": "cc3"},
 {"d": "dd4"},
 {"child": {"c": "cc5"}}
]

The spec states:

The resultant nodelist of a descendant-selector applied to a node must be a sub-sequence of an array-sequenced preorder of the descendants of the node.

Because of the ordering of the array, in all array-sequenced preorders of the above value, "cc1", "dd1", and "ee1" appear before "cc2" and "dd2" which appear before "cc3" which appears before "dd4" which appears before "cc5". So the order of the result according to the spec is:

["cc1", "dd1", "cc2", "dd2", "cc3", "dd4", "cc5"]

However, the current consensus according to the comparison project is:

["cc1", "cc2", "cc3", "cc5", "dd1", "dd2", "dd4"]

where it seems the value is traversed once per item of the list (['c', 'd']).

Ordering of duplicate nodes

Consider the path $..[0, 1, 0] applied to the value:

[[1, 2]]

There is just one array-sequenced preorder of this value: [[1, 2], 1, 2] (noting the clarification below). But the current list semantics produces duplicates and can reorder the elements of an array, so the resultant nodelist would be:

[1, 2, 1]

which is not a sub-sequence of [[1, 2], 1, 2].

Clarification of array-sequenced preorder

We should make it clear that an array-sequenced preorder is of nodes rather than values.

For example, consider the path $..[0] applied to the value:

[1, [2], 1]

The array-sequenced preorder of this value is [1, [2], 2, 1] and the values selected by $..[0] are 1 and 2, but the nodelist [2, 1] is not a valid result even though it is a subsequence of the values in the array-sequenced preorder. The result is [1, 2] which is a subsequence of the nodes in the array-sequenced preorder. (In fact, using an array of values to describe an array-sequenced preorder is misleading because it ignores the nodes' locations within the argument value.)

glyn commented 2 years ago

I propose the following actions:

  1. Clarify which nodes are selected.
  2. Preserve the current ordering, regardless of the implementation consensus, since it easily implemented via a single traversal of the descendants.
  3. Allow the descendant list selector to reorder a subsequence of the nodes in an array-sequenced preorder, although it must honour the order of the elements in the selector's list.
  4. Clarify the definition of array-sequenced preorder.

Any objections?

danielaparker commented 2 years ago

However, the current consensus according to the comparison project is:

["cc1", "cc2", "cc3", "cc5", "dd1", "dd2", "dd4"]

I don't think you can tell about the ordering from this result, the query is flagged with footnote 4, see (https://cburgmer.github.io/json-path-comparison/#union_with_keys_after_recursive_descent), which indicates that these results are being sorted before comparison.

A quick check shows that a number of implementations that fall in the consensus (including Goessner JavaScript) actually return

["cc1","dd1","cc2","dd2","cc3","dd4","cc5"]
glyn commented 2 years ago

@danielaparker well spotted (and thanks for raising https://github.com/cburgmer/json-path-comparison/issues/118). So that leaves the following proposed actions:

  1. Clarify which nodes are selected.
  2. Allow the descendant list selector to reorder a subsequence of the nodes in an array-sequenced preorder, although it must honour the order of the elements in the selector's list.
  3. Clarify the definition of array-sequenced preorder.
danielaparker commented 2 years ago
  1. Allow the descendant list selector to reorder a subsequence of the nodes in an array-sequenced preorder, although it must honour the order of the elements in the selector's list.
  2. Clarify the definition of array-sequenced preorder.

That sounds complicated :-) One thing to keep in mind is that implementations that return

["cc1","dd1","cc2","dd2","cc3","dd4","cc5"]

do not do anything complicated, nor do they "reorder a subsequence of the nodes in an array-sequenced preorder". Rather, they simply apply the selector ['c', 'd'] to each node in the provided node list, and then (for the recursion operator) apply the same logic recursively on the children of each node. That description alone implies the above result.

glyn commented 2 years ago

I agree it sounds complicated. Perhaps, along the lines you describe, it is best to describe the possible orderings of descendants and then say how particular descendant selectors are apply to those descendants.

danielaparker commented 2 years ago

I agree it sounds complicated. Perhaps, along the lines you describe, it is best to describe the possible orderings of descendants and then say how particular descendant selectors are apply to those descendants.

It may be helpful to think of the recursion operator .. as an operation that maps the current result list into a new result list according to the following rules:

First, create a new empty result list. Then

This becomes the new current result list.

So if the current result list contains the root value

[
 {"c":"cc1","d":"dd1","e":"ee1"},
 {"c": "cc2", "child": {"d": "dd2"}},
 {"c": "cc3"},
 {"d": "dd4"},
 {"child": {"c": "cc5"}}
]

the recursive operation produces

[
 [
  {"c":"cc1","d":"dd1","e":"ee1"},
  {"c": "cc2", "child": {"d": "dd2"}},
  {"c": "cc3"},
  {"d": "dd4"},
  {"child": {"c": "cc5"}}
 ],
 {"c":"cc1","d":"dd1","e":"ee1"},
 {"c": "cc2", "child": {"d": "dd2"}},
 {"d": "dd2"},
 {"c": "cc3"},
 {"d": "dd4"},
 {"child": {"c": "cc5"}},
{"c": "cc5"}
]

Then applying the selector ['c', 'd'] to each element in the new current result list produces

[
 "cc1",
 "dd1",
 "cc2",
 "dd2",
 "cc3",
 "dd4",
 "cc5"
]
glyn commented 2 years ago

I'd prefer to define the orderings of the descendants in terms of a postcondition rather than an algorithm because that avoids implementation bias - a useful property for a spec. Sometimes the postcondition form really doesn't work and we have to resort to an algorithm, for example in the slice semantics. Let's see how the postcondition works out in a PR...

glyn commented 2 years ago

Oh and I would say that describing the .. part of the descendant selector as a recursion operator is another example of implementation bias (since implementations need not use recursion). The spec talks about descendant selectors because this term describes what the selectors are rather than how they are implemented.