CompPath (Composite-objects path) Expressions

As initially discussed in issue #341, we were exploring different ways to provide an XPath-like language to traverse in depth composite objects such as maps and arrays and select their members at any depth. While working on this, the idea of an XPath-like language for composite items started to emerge and here we present this idea in a more or less crystalized form.

1. Root Component

Any CompPath expression must start off a composite item (of type map or array, or of other future composite item type (maybe set? ) ). This can be a literal composite item or a reference to a variable whose value is a composite item.

Examples:


(: Literal composite items: :)
[1, 2, 3]

[1, [2,  3]]?2

{"x":1, "y" : map{ "z": 2}}

{"x":1, "y" : map{ "z": 2}} ?y

(: Variables containing composite items: :)
let $comp1 := [1, [2, 3]],
 $comp2 :=$comp1 ?2,
 $comp3 := {"x":1, "y" : map{ "z": 2}},
 $comp4 := $comp3 ?y

In the above examples all literal expressions and all variables ($comp1, $comp2, $comp3, $comp4) may serve as the root component for a CompPath expression.

2. The component-path operator (\)

The component-path operator "\" is used to build expressions for locating members at any depth within component trees. Its left-hand side expression must return a result that is a composite item or else this result is represented as such by wrapping it into an array.

The operator returns an array, the values of whose members are composite items themselves or any such value may be a non-composite "leaf" in the root-component tree).

Each operation E1\E2 is evaluated as follows: Expression E1 is evaluated, and the result is wrapped in an array A1. If any member of A1 is not a composite item, a type error is raised. Each member of A1 serves in turn to provide an inner "composite-focus" (the member as the "composite-context-item" or ., its index in A1 as the "composite-context-position" or index(), the set of keys of the composite-context-item as the "composite-keyset" or keys() and the size of this member as the "composite-context-size" (specified as one of: size(), or array-size() or key-size()) ) for the evaluation of E2. The result of each evaluation of E2, if it isn't a single composite item, is wrapped in a single array. The arrays resulting from all the evaluations of E2 are wrapped in a single array and this single array is the result of the evaluation.

E2 is typically a function over the context-focus and its results will be the set of the next step composite-context-items (used as the left-hand-side of the next in chain composite-step-expression (see below)), or these results would be the final results of evaluation if this is the last-in chain composite-step-expression.

3. Composite-Steps

A composite-step is a part of a composite-path-expression that generates an array and filters its members by zero or more predicates. A composite-step-expression is either a CompositeAxisStep or a CompositePostfixExpression.

4. Composite-Axes

The following axes are defined for traversing a composite-item tree:

The child-member:: axis contains the members of the composite-context-item.
The value-member:: axis contains the members of the composite-context-item that are not composite themselves.
The node-member:: axis contains the members of the composite-context-item that are nodes.
The descendant-member:: axis is defined as the transitive closure of the child-member:: axis; it contains the descendent-members of the composite-context-item (the child members of the composite-context-item, and their child-members, ... and so on).
The self:: axis contains just the composite-context-item.
The descendant-member-or-self:: contains the composite-context-item and all of its descendent-members.
The following-sibling-member:: axis contains the members of the immediate container of the composite-context-item that follow it. For any two members mem1 and mem2 of a composite item Comp, by definition mem2 follows mem1 if and only if Comp is an array and the index of mem2 in Comp is greater than that of mem1, or if Comp is a map, then the key of mem2 is greater than that of mem1.
The preceding-sibling-member:: axis contains the members of the immediate container of the composite-context-item that precede it. For any two members mem1 and mem2 of a composite item Comp, by definition mem1 precedes mem2 if and only if Comp is an array and the index of mem2 in Comp is greater than that of mem1, or if Comp is a map, then the key of mem2 is greater than that of mem1.

For example, following-sibling-member::5 means all members of the composite-context-item with index > 5, and preceding-sibling-member::5 means all members of the composite-context-item with index < 5

Note: If the immediate container of the composite-context-item is a map whose key-values cannot be ordered, then specifying either of the following-sibling-member:: or preceding-sibling-member:: axes on this composite-context-item must raise a type error. (Obviously, these two axes are meaningful only for composite items, whose members are ordered, such as the array).

If the composite-axis name is omitted from a composite-axis step, the default axis is child-member::

5. Composite Axis Steps

A composite axis step completely resembles the ordinary axis step in XPath. It consists of three parts:
1. The composite axis (child-member::, descendant-member::, value-member::, node-member::, following-sibling-member::, preceding-sibling-member::, self::, or the descendant-member-or-self:: axis)
2. The member test
3. The composite-predicates
6. Member Tests

A member test is a condition on the key-name, index, or kind (composite, map, array or value, node, or (any) member). A member test determines which members contained by a copmosite-axis are selected by a composite-step.

As such, a member test is either an identifier-test (key-name or index) or a kind-test (composite, map, array, value, or member)).

Examples of member identifiers:
A string specifies a name of a key, whose value will be selected. For example: \child-member::X selects from the composite-context-item the value corresponding to its key which has the name "X".
\child-member::3 selects from the composite-context-item the value of its 3rd member, if it is an array or the value corresponding to its key 3, if it is a map.
following-sibling-member::3 selects from the composite-content-item (which is most-likely an array) all of its members having index greater than 3.
preceding-sibling-member::3 selects from the composite-content-item (which is most-likely an array) all of its members having index less than 3.
\descendant-member-or-self::X selects from the composite-context-item (that must be a map) and from all its descendant-members, the values corresponding to their key named "X", if these descendents have a key named "X".
Similarly \5 is equivalent to \child-member::5 and selects from the composite-context-item that is an array the value of its 5th member. This will also select the value corresponding to the key 5 from the composite-context-item if it is a map, because on the child-member:: axis both maps and arrays may be selected.
\X is equivalent to \child-member::X and selects from the composite-context-item (that must be a map), the value corresponding to its key which has the name "X".

There is also the pseudo-operator \\ . This is an abbreviation for:

\descendant-member-or-self::member()\

Thus, \\X means: "(Deep) Select all members of the root-component that are the corresponding values of keys equal to 'X' "
We may use a kind test as part of the previous example, if we want to select only a specific kind of members of the composite-context-item. \array() In this example, although we are on the child-member:: axis, we want to select only members of the composite-context-item that are arrays.
\map() In this example, although we are on the child-member:: axis, we want to select only members of the composite-context-item that are maps.
\value() In this example we want to select only members of the composite-context-item that are not composite items themselves.
\node() In this example we want to select only members of the composite-context-item that are nodes.
\member() In this example we want to select all members of the composite-context-item, regardless whether they are maps, arrays, or values.

6.1 Wildcards

The *``** wildcard can be used instead of a member identifier. Its meaning is to select all existing members of the composite-context-item, that is possibly selected by a specific axis and limited by a specific member kind-test.

Examples:
\* (: (Shallow) Selects all members of the composite-context-item :)
\map()\* (: Selects from the composite-context-item all values that correspond to a key of any map-member of the composite-context item :)
\array()\* (: Selects from the composite-context-item all members of all its members that are arrays :)
\\* (: (Deep) Select all members of the composite tree rooted by the root-component :)

7. Predicates

As defined above, a composite-step has three parts: composite-axis (can be omitted and then a default axis is used), member test, and an optional list of composite-predicates.

A composite-predicate in a composite-step is an expression used as a filter applied on the members of the composite-context-item that are already selected by the axis and member tests of the axis step, and not filtered out by any preceding composite-predicates in the composite-predicates-list. The composite-predicate may be any XPath expression and is written within double square brackets.

Examples:

\*[[3]] (: Selects any member of the composite-context-item, that is an array and has a 3rd member or any member of the composite-context-item, that is a map and has a key 3 :) This is a shorthand for: \*[[array-size() ge 3 or 3 = keys()]]
\array()[[3]] (: Selects those array members of the composite-context-item that have a 3rd member :) This is a shorthand for: \*[[size() ge 3]]
\*[[size() eq 7]] (: Selects those members whose array-size() or key-size() is exactly 7:) This is a shorthand for: *`\composite::[[self::map() and key-size() eq 7 or self::array() and array-size() eq 7]]`**
\*[[X]] (: Selects any member of the composite-context-item, that is a map and has a key X :)
\map()[[X]] (: Selects any map member of the composite-context-item, that has a key X :) The above two expressions are a shorthand for: \*[['X' = keys()]]
\value()[[. gt 0]] (: Selects any value (non-composite member) of the composite-context-item, that is a positive number :)

8. Mixing CompPath and XPath expressions

CompPath and XPath expressions can be used as parts of a single expression:
- A CompPath expression may be appended at the end of any XPath expression that produces a composite-object .
- An XPath expression may be appended at the end of any CompPath expression. When doing this,
  
  CompPathExpr / XPathExpr
  
  is equivalent to:
  
  *`CompPathExpr\node:: / XPathExpr`**
  
  And this:
  
  CompPathExpr ! XPathExpr (: Note: also causes ordering and deduplication of the nodes! :)
  
  is equivalent to:
  
  *`CompPathExpr\value:: ! XPathExpr`** (: Note: No ordering or deduplication, can be applied on any item, not just on nodes :)
- A CompPath expression may be substituted for the expected argument of any XPath expression, for example: count(MyCompPathExpr)
- Any XPath expression that produces a composite item can be used as the composite-root for any CompPath expression
Example:
```
let $myBooks := 
<books>
<book name="Tom Sawyer">
<author>Mark Twain</author>
</book>
<book name="Wuthering Heights">
<author>Emily Brontë</author>
</book>
<book name="Jane Eyre">
<author>Charlotte Brontë</author>
</book>
<book name="Adventures of Huckleberry Finn">
<author>Mark Twain</author>
</book>
</books>,
$map1 := map {"science-works": map{"Einstein": "Special Theory of relativity",
                               "Darwin" : "On the Origin of Species"
                              },
          "literature" : map{"19the Century": $myBooks}
         }
return
$map1\literature\\*/book[author eq 'Mark Twain']
```
Evaluating this mixed CompPath and XPath expression produces the correct result:

<book name="Tom Sawyer">
  <author>Mark Twain</author>
</book>
<book name="Adventures of Huckleberry Finn">
  <author>Mark Twain</author>
</book>

Thanks for this. I haven't studied it in detail but I've been thinking along fairly similar lines, in particular the idea of introducing axes into the picture.

I'm pretty sure some stronger foundations for this would be needed in the data model: you can't ask what the preceding-sibling of 17 is, except by introducing something that distinguishes that particular 17 from other 17s that exist elsewhere.

I'm pretty sure some stronger foundations for this would be needed in the data model: you can't ask what the preceding-sibling of 17 is, except by introducing something that distinguishes that particular 17 from other 17s that exist elsewhere.

Actually,

following-sibling-member::17

means all array members whose index is greater than 17

17 here is the index (identifier) of the member, not the value.

I updated the document and now this is understandable. Thanks for raising this!

What is the gain using the new $map1\literature\\*/book[author eq 'Mark Twain'] over the existing $map1?literature?*/book[author eq 'Mark Twain']? At least in that sample data and for that particular selection/query it seems the existing map lookup operator does the job. I see that some more powerful features are in the \ or \\ operator but shouldn't the examples then show the benefits/improved power of the new operator versus existing ones?

What is the gain using the new $map1\literature\\*/book[author eq 'Mark Twain'] over the existing $map1?literature?*/book[author eq 'Mark Twain']? At least in that sample data and for that particular selection/query it seems the existing map lookup operator does the job. I see that some more powerful features are in the \ or \\ operator but shouldn't the examples then show the benefits/improved power of the new operator versus existing ones?

@martin-honnen Correct, I will update with a better example, maybe this weekend. Best kind of example is deep search and having mixed types of members (some composite and some arrays of non-composite).

This will also show the case when a current XPath expression will raise an error, while the ComPath (pronounced "compassionate" 😄 ) will work.

@dnovatchev The proposal looks pretty elaborated, thanks.

If the following rule is evaluated, …

Each operation E1\E2 is evaluated as follows: Expression E1 is evaluated, and the result is wrapped in an array A1.

…wouldn't E1\E2/step raise an error (path operations are only defined for nodes), or are results of composite path expressions to be unwrapped again after the evaluation of the last step?

And I believe the following rule…

Its left-hand side expression must return a result that is a composite item or else this result is represented as such by wrapping it into an array.

…introduces too much implicit magic. I would prefer to get an error if the LHS has an unsupported type.

It would also be interesting to assess if the proposal can be combined with the existing lookup operator and a potential descendant lookup operator ?? (https://github.com/qt4cg/qtspecs/issues/297).

Each operation E1\E2 is evaluated as follows: Expression E1 is evaluated, and the result is wrapped in an array A1.

…wouldn't E1\E2/step raise an error (path operations are only defined for nodes), or are results of composite path expressions to be unwrapped again after the evaluation of the last step?

@ChristianGruen Good observation. We need to make a more precise rule for appending "/" to the end of a CompPath expression:

CompPathExpr / XPathExpr

is expanded to:

*`CompPathExpr\value:: / XPathExpr`**

And if any of the values on the left-hand side of "/" is not a node, this results in Type Error

Or, we could have an axis that is even more restrictive than "value" -- let it be the "node::" axis, which selects only value members that are nodes, and then we define:

CompPathExpr / XPathExpr

to be equivalent to:

*`CompPathExpr\node:: / XPathExpr`**

This proposal is "more or less crystalized" 😄 which means that it is a base for discussion and further improvements.

So, thank you for signaling this, as we now have such an improvement! I will update the proposal with the new axis.

@ChristianGruen I updated the proposal with the "node::" axis and the node() kind-test.

Also defined that

CompPathExpr / XPathExpr

is equivalent to:

*`CompPathExpr\node:: / XPathExpr`**

Please, take a look.

What about unwrapping the final result? Are there specific advantages for the wrapped representation?

CompPathExpr\node::* / XPathExpr

Would this be equivalent?

CompPathExpr?* / XPathExpr

What about unwrapping the final result? Are there specific advantages for the wrapped representation?

I wanted to have the left-hand side of \ always as a single component object -- thus the wrapping. Maybe you are right and this is not necessary. Let us keep this open.

CompPathExpr\node::* / XPathExpr
Would this be equivalent?
CompPathExpr?* / XPathExpr

No, because the former gets us only the values that are nodes, but the latter gives us all values.

Thus, the former will not result in type errors due to the left-hand side of / being a non-node, while the latter will raise an error in any such case.

I agree that the wrapping both seems to be necessary, and causes its own problems. It also fails to solve another problem which I'm keen to address, namely reverse (ancestor) navigation. I think it's possible to combine the ideas in this proposal with the ideas in issue #334 (transient properties) to produce something better. Here's a sketch (it's certainly not fully worked out).

First, we add something to the data model. Any value can have the property of being "pinned". (Think of placing a pin in a map, to identify a location in the map). Unless otherwise specified, the pin makes no difference to the result of any operation on the value, for example 2+2 evaluates to 4 regardless whether either of the operands is pinned. If a value is pinned, then the property holds information about how and where the value was found, and further operations (axis navigation) are available to take advantage of this.

For example, consider the expression $A?3 where $a is an array. If $A is pinned, we make a couple of changes to the behaviour of this expression: firstly, its result will also be pinned, and secondly, the selection will be error-free (if $A is not an array with 3 or more elements, it returns an empty sequence).

Suppose $A is ["x", "y", "z"], then $A?3 is "z". Except that the "z" is not just an ordinary "z" that can be used like any other string, it is a pinned "z" which gives it extra power. For example we can write let $val := $A?3 return $val¶index (we'll choose punctuation symbols later: perhaps $val::index, or $val?index::*) which returns 3, the index at which the value was found; and we can write let $val := $A?3 return $val¶owner which returns ["x", "y", "z"], the array in which it was found. And we could say that $val¶prior returns "y", the previous member of the array - again as a pinned value.

Similarly with maps, if $M is map{'x':5, 'y':6, 'z':7}, and is pinned, then $M?z returns a pinned 7, enabling let $val := $M?z return $val¶key to return "z", the key by which it was found, or let $val := $M?z return $val¶ownerto return the original map. If we use ?? for map:find(), then `$M??price[[¶owner?code='Z7890']] gives the (pinned) prices of all descendant objects whose ?code is 'Z7980'.

A pinned value also has a ¶path, representing the route by which it was selected.

Non-failing "?" operations will be allowed on any pinned value, even (say) a string or an integer, so if an operation like ?* returns a mix of arrays, maps, strings, and integers, we can still apply further "?" operations to it, without failure.

So basically, I'm proposing that rather than wrapping the result of the operation in an array, we "pin" it, and unlike wrapping in an array, this doesn't change the operations you can perform: for example if the pinned value is a node, you can use it on the LHS of "/".

If a value is pinned, then the property holds information about how and where the value was found, and further operations (axis navigation) are available to take advantage of this.

A value (the same value or $xxx (variable reference) ) can be a member of many, different maps and arrays. Then it will have many different owners, indexes and paths.

I am not sure how we can select "the right one" of these many values. Or would we regard a single literal value (like 7) or node $myNode, as several literals and several nodes, depending on each of its different owners?

People who are not programmers get really confused about the difference between \ and /. This seems pretty common, to such an extent that Windows now accepts both in paths, and people in the UK refer to / as forward slash to try and reduce the confusion.

I really don't want to have to teach this. Can we make / work?

People who are not programmers get really confused about the difference between \ and /. This seems pretty common, to such an extent that Windows now accepts both in paths, and people in the UK refer to / as forward slash to try and reduce the confusion.

I really don't want to have to teach this. Can we make / work?

Not without breaking backwards compatibility and needing to rewrite most of the XPath spec... 😢

We should study and learn from JSONPath (https://datatracker.ietf.org/doc/draft-ietf-jsonpath-base/13/). We can't adopt it "as is", because our data model is a superset of the JSON data model, but it does have some ideas that we could adopt rather than re-inventing.

It seems to me that a key idea in JSONPath is that a query selects a set of "nodes", where a node is defined as a value together with its location, the location being essentially the path by which it was reached. Attaching the location to the result values essentially makes it possible to navigate from the value to other values. The "location" thus provides an equivalent to the XDM notion of "node identity".

(Having said that, as far as I can see, JSONPath only offers "child" and "descendant" navigation from a node, so the concept appears to be under-exploited).

Some of these ideas are now in the spec; this proposal needs to be revised in terms of the features the language (assuming related pending PRs are accepted).

qt4cg / qtspecs