qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

CompPath (Composite-objects path) Expressions #350

Open dnovatchev opened 1 year ago

dnovatchev commented 1 year ago

CompPath (Composite-objects path) Expressions

As initially discussed in issue #341, we were exploring different ways to provide an XPath-like language to traverse in depth composite objects such as maps and arrays and select their members at any depth. While working on this, the idea of an XPath-like language for composite items started to emerge and here we present this idea in a more or less crystalized form.

1. Root Component

Any CompPath expression must start off a composite item (of type map or array, or of other future composite item type (maybe set? ) ). This can be a literal composite item or a reference to a variable whose value is a composite item.

Examples:


(: Literal composite items: :)
[1, 2, 3]

[1, [2,  3]]?2

{"x":1, "y" : map{ "z": 2}}

{"x":1, "y" : map{ "z": 2}} ?y

(: Variables containing composite items: :)
let $comp1 := [1, [2, 3]],
 $comp2 :=$comp1 ?2,
 $comp3 := {"x":1, "y" : map{ "z": 2}},
 $comp4 := $comp3 ?y

In the above examples all literal expressions and all variables ($comp1, $comp2, $comp3, $comp4) may serve as the root component for a CompPath expression.

2. The component-path operator (\)

The component-path operator "\" is used to build expressions for locating members at any depth within component trees. Its left-hand side expression must return a result that is a composite item or else this result is represented as such by wrapping it into an array.

The operator returns an array, the values of whose members are composite items themselves or any such value may be a non-composite "leaf" in the root-component tree).

Each operation E1\E2 is evaluated as follows: Expression E1 is evaluated, and the result is wrapped in an array A1. If any member of A1 is not a composite item, a type error is raised. Each member of A1 serves in turn to provide an inner "composite-focus" (the member as the "composite-context-item" or ., its index in A1 as the "composite-context-position" or index(), the set of keys of the composite-context-item as the "composite-keyset" or keys() and the size of this member as the "composite-context-size" (specified as one of: size(), or array-size() or key-size()) ) for the evaluation of E2. The result of each evaluation of E2, if it isn't a single composite item, is wrapped in a single array. The arrays resulting from all the evaluations of E2 are wrapped in a single array and this single array is the result of the evaluation.

E2 is typically a function over the context-focus and its results will be the set of the next step composite-context-items (used as the left-hand-side of the next in chain composite-step-expression (see below)), or these results would be the final results of evaluation if this is the last-in chain composite-step-expression.

3. Composite-Steps

A composite-step is a part of a composite-path-expression that generates an array and filters its members by zero or more predicates. A composite-step-expression is either a CompositeAxisStep or a CompositePostfixExpression.

4. Composite-Axes

The following axes are defined for traversing a composite-item tree:

7. Predicates

As defined above, a composite-step has three parts: composite-axis (can be omitted and then a default axis is used), member test, and an optional list of composite-predicates.

A composite-predicate in a composite-step is an expression used as a filter applied on the members of the composite-context-item that are already selected by the axis and member tests of the axis step, and not filtered out by any preceding composite-predicates in the composite-predicates-list. The composite-predicate may be any XPath expression and is written within double square brackets.

Examples:

<book name="Tom Sawyer">
  <author>Mark Twain</author>
</book>
<book name="Adventures of Huckleberry Finn">
  <author>Mark Twain</author>
</book>
michaelhkay commented 1 year ago

Thanks for this. I haven't studied it in detail but I've been thinking along fairly similar lines, in particular the idea of introducing axes into the picture.

I'm pretty sure some stronger foundations for this would be needed in the data model: you can't ask what the preceding-sibling of 17 is, except by introducing something that distinguishes that particular 17 from other 17s that exist elsewhere.

dnovatchev commented 1 year ago

I'm pretty sure some stronger foundations for this would be needed in the data model: you can't ask what the preceding-sibling of 17 is, except by introducing something that distinguishes that particular 17 from other 17s that exist elsewhere.

Actually,

following-sibling-member::17

means all array members whose index is greater than 17

17 here is the index (identifier) of the member, not the value.

I updated the document and now this is understandable. Thanks for raising this!

martin-honnen commented 1 year ago

What is the gain using the new $map1\literature\\*/book[author eq 'Mark Twain'] over the existing $map1?literature?*/book[author eq 'Mark Twain']? At least in that sample data and for that particular selection/query it seems the existing map lookup operator does the job. I see that some more powerful features are in the \ or \\ operator but shouldn't the examples then show the benefits/improved power of the new operator versus existing ones?

dnovatchev commented 1 year ago

What is the gain using the new $map1\literature\\*/book[author eq 'Mark Twain'] over the existing $map1?literature?*/book[author eq 'Mark Twain']? At least in that sample data and for that particular selection/query it seems the existing map lookup operator does the job. I see that some more powerful features are in the \ or \\ operator but shouldn't the examples then show the benefits/improved power of the new operator versus existing ones?

@martin-honnen Correct, I will update with a better example, maybe this weekend. Best kind of example is deep search and having mixed types of members (some composite and some arrays of non-composite).

This will also show the case when a current XPath expression will raise an error, while the ComPath (pronounced "compassionate" 😄 ) will work.

ChristianGruen commented 1 year ago

@dnovatchev The proposal looks pretty elaborated, thanks.

If the following rule is evaluated, …

Each operation E1\E2 is evaluated as follows: Expression E1 is evaluated, and the result is wrapped in an array A1.

…wouldn't E1\E2/step raise an error (path operations are only defined for nodes), or are results of composite path expressions to be unwrapped again after the evaluation of the last step?

And I believe the following rule…

Its left-hand side expression must return a result that is a composite item or else this result is represented as such by wrapping it into an array.

…introduces too much implicit magic. I would prefer to get an error if the LHS has an unsupported type.

It would also be interesting to assess if the proposal can be combined with the existing lookup operator and a potential descendant lookup operator ?? (https://github.com/qt4cg/qtspecs/issues/297).

dnovatchev commented 1 year ago

Each operation E1\E2 is evaluated as follows: Expression E1 is evaluated, and the result is wrapped in an array A1.

…wouldn't E1\E2/step raise an error (path operations are only defined for nodes), or are results of composite path expressions to be unwrapped again after the evaluation of the last step?

@ChristianGruen Good observation. We need to make a more precise rule for appending "/" to the end of a CompPath expression:

CompPathExpr / XPathExpr

is expanded to:

*`CompPathExpr\value:: / XPathExpr`**

And if any of the values on the left-hand side of "/" is not a node, this results in Type Error

Or, we could have an axis that is even more restrictive than "value" -- let it be the "node::" axis, which selects only value members that are nodes, and then we define:

CompPathExpr / XPathExpr

to be equivalent to:

*`CompPathExpr\node:: / XPathExpr`**

This proposal is "more or less crystalized" 😄 which means that it is a base for discussion and further improvements.

So, thank you for signaling this, as we now have such an improvement! I will update the proposal with the new axis.

dnovatchev commented 1 year ago

@ChristianGruen I updated the proposal with the "node::" axis and the node() kind-test.

Also defined that

CompPathExpr / XPathExpr

is equivalent to:

*`CompPathExpr\node:: / XPathExpr`**

Please, take a look.

ChristianGruen commented 1 year ago

What about unwrapping the final result? Are there specific advantages for the wrapped representation?

CompPathExpr\node::* / XPathExpr

Would this be equivalent?

CompPathExpr?* / XPathExpr
dnovatchev commented 1 year ago

What about unwrapping the final result? Are there specific advantages for the wrapped representation?

I wanted to have the left-hand side of \ always as a single component object -- thus the wrapping. Maybe you are right and this is not necessary. Let us keep this open.

CompPathExpr\node::* / XPathExpr

Would this be equivalent?

CompPathExpr?* / XPathExpr

No, because the former gets us only the values that are nodes, but the latter gives us all values.

Thus, the former will not result in type errors due to the left-hand side of / being a non-node, while the latter will raise an error in any such case.

michaelhkay commented 1 year ago

I agree that the wrapping both seems to be necessary, and causes its own problems. It also fails to solve another problem which I'm keen to address, namely reverse (ancestor) navigation. I think it's possible to combine the ideas in this proposal with the ideas in issue #334 (transient properties) to produce something better. Here's a sketch (it's certainly not fully worked out).

First, we add something to the data model. Any value can have the property of being "pinned". (Think of placing a pin in a map, to identify a location in the map). Unless otherwise specified, the pin makes no difference to the result of any operation on the value, for example 2+2 evaluates to 4 regardless whether either of the operands is pinned. If a value is pinned, then the property holds information about how and where the value was found, and further operations (axis navigation) are available to take advantage of this.

For example, consider the expression $A?3 where $a is an array. If $A is pinned, we make a couple of changes to the behaviour of this expression: firstly, its result will also be pinned, and secondly, the selection will be error-free (if $A is not an array with 3 or more elements, it returns an empty sequence).

Suppose $A is ["x", "y", "z"], then $A?3 is "z". Except that the "z" is not just an ordinary "z" that can be used like any other string, it is a pinned "z" which gives it extra power. For example we can write let $val := $A?3 return $val¶index (we'll choose punctuation symbols later: perhaps $val::index, or $val?index::*) which returns 3, the index at which the value was found; and we can write let $val := $A?3 return $val¶owner which returns ["x", "y", "z"], the array in which it was found. And we could say that $val¶prior returns "y", the previous member of the array - again as a pinned value.

Similarly with maps, if $M is map{'x':5, 'y':6, 'z':7}, and is pinned, then $M?z returns a pinned 7, enabling let $val := $M?z return $val¶key to return "z", the key by which it was found, or let $val := $M?z return $val¶ownerto return the original map. If we use ?? for map:find(), then `$M??price[[¶owner?code='Z7890']] gives the (pinned) prices of all descendant objects whose ?code is 'Z7980'.

A pinned value also has a ¶path, representing the route by which it was selected.

Non-failing "?" operations will be allowed on any pinned value, even (say) a string or an integer, so if an operation like ?* returns a mix of arrays, maps, strings, and integers, we can still apply further "?" operations to it, without failure.

So basically, I'm proposing that rather than wrapping the result of the operation in an array, we "pin" it, and unlike wrapping in an array, this doesn't change the operations you can perform: for example if the pinned value is a node, you can use it on the LHS of "/".

dnovatchev commented 1 year ago

If a value is pinned, then the property holds information about how and where the value was found, and further operations (axis navigation) are available to take advantage of this.

A value (the same value or $xxx (variable reference) ) can be a member of many, different maps and arrays. Then it will have many different owners, indexes and paths.

I am not sure how we can select "the right one" of these many values. Or would we regard a single literal value (like 7) or node $myNode, as several literals and several nodes, depending on each of its different owners?

liamquin commented 1 year ago

People who are not programmers get really confused about the difference between \ and /. This seems pretty common, to such an extent that Windows now accepts both in paths, and people in the UK refer to / as forward slash to try and reduce the confusion.

I really don't want to have to teach this. Can we make / work?

dnovatchev commented 1 year ago

People who are not programmers get really confused about the difference between \ and /. This seems pretty common, to such an extent that Windows now accepts both in paths, and people in the UK refer to / as forward slash to try and reduce the confusion.

I really don't want to have to teach this. Can we make / work?

Not without breaking backwards compatibility and needing to rewrite most of the XPath spec... 😢

michaelhkay commented 1 year ago

We should study and learn from JSONPath (https://datatracker.ietf.org/doc/draft-ietf-jsonpath-base/13/). We can't adopt it "as is", because our data model is a superset of the JSON data model, but it does have some ideas that we could adopt rather than re-inventing.

It seems to me that a key idea in JSONPath is that a query selects a set of "nodes", where a node is defined as a value together with its location, the location being essentially the path by which it was reached. Attaching the location to the result values essentially makes it possible to navigate from the value to other values. The "location" thus provides an equivalent to the XDM notion of "node identity".

(Having said that, as far as I can see, JSONPath only offers "child" and "descendant" navigation from a node, so the concept appears to be under-exploited).

ndw commented 3 months ago

Some of these ideas are now in the spec; this proposal needs to be revised in terms of the features the language (assuming related pending PRs are accepted).