ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
59 stars 20 forks source link

Clarification on Terminology #66

Closed fiestajetsam closed 3 years ago

fiestajetsam commented 3 years ago

(This was original raised in #54 and now being split out)

My own view is that the terminology should stay consistent with RFC 8259, and that the word "object" should not be used for items that are not JSON objects in the sense of RFC 8259. (Daniel P)

To Carsten's point about what we call things, the number of distinguished terms per RFC8259 is pretty small: JSON text, value, object, array, number, string. Having spent quite a bit of time specifying JSON DSLs, I find that using just those terms doesn't seem to get in the way or cause problems, so I'd argue that we should stick to them (and build up to higher-level constructs as required for JSONPath).

… oh, and I forgot the very useful "member". (Tim Bray)

… and “element” (the things in arrays). (Carsten Bormann)

The problem with JSON value is that it also can be quite confusing due to the usual use of that term. Pointing to a tree and saying “the values inside that tree” is not going to be felt as equivalent to “the set of all subtrees of that tree, including the tree itself”. But if JSON value is the only term we have, it has to be. Hence my preference to talk about data items when I mean the items themselves and not their “value”. (Carsten Bormann)

I think the key difficulty is whether each (key, value) pair in an object is "a thing" that can be identified and manipulated and potentially returned. (If we're talking analogies, then it's analogous to an attribute node in the XDM model). (Michael Kay)

ECMA-404 uses "name/value pair", which is what I understand the term "member" to mean (Douglas Crockford uses "member"). (Daniel P)

I think the term “union” is poor. If we think of it as concatenation of results, then the result is as expected. (Glyn Normington)

I understand, that within RFC8259 we have JSON values of different types. They are structured somehow, which is not so much of interest here.

But while querying that structure with JSONPath it is vitally important to identify that hierarchical structure as a tree. So in fact we build up a higher-level construct here. We also need to call "the things" in the tree somehow. I was able to identify

but could not see an agreement here.

I agree to Glyn calling the term "union" poor (s. below).

gregsdennis commented 3 years ago

Also relates to #21.

danielaparker commented 3 years ago

Until the early discussions on the jsonpath list, I thought more about value selection, I hadn't really thought much about nodes, or appreciated the importance of paths for identity (duplicate removal) and sorting, if the user wanted that. But having changed my thinking based on those conversations, I'd like to understand better what exactly a node represents in JSONPath.

In XML, we have the DOM, and DOM nodes. But in JSON we don't really have anything like that. I don't find it natural to think about a "tree of nodes" in the context of JSONPath. The implementations I know about don't have a tree, they don't identify location by the position in a tree, rather, they operate on path/value pairs.

For example, looking at the slice operator code in David Chester's Node.js implementation

'subscript-child-slice': function(component, partial) {
    if (is_array(partial.value)) {
      var args = component.expression.value.split(':').map(_parse_nullable_int);
      var values = partial.value.map(function(v, i) { return { value: v, path: partial.path.concat(i) } });
      return slice.apply(null, [values].concat(args));
    }
  }

there is no tree, just a selected value and an updated path. That's typical I think of many implementations.

Would it be better to talk about a node as a path-value pair, rather than a node in a tree?

glyn commented 3 years ago

For me, regarding a JSON value as a tree isn't an implementation consideration - it's an observation about the abstract structure of the data. Similarly, the notion of the position (or perhaps location) of a node in the tree is just an abstract way of identifying where the node is in the tree.

A path is one implementation of the abstract notion of position, but not the only one. I think in the spec we should try to avoid implementation bias and speak of positions rather than paths, references, or pointers, all of which bring to mind implementations.

It's unfortunate that (general) trees are also common data structures and maybe that's part of the problem here. But in reality, we also have binary trees, AVL trees, splay trees, B-trees, etc., so I try to think of a general tree data structure as just one way of implementing an abstract tree.

Does that help at all?

danielaparker commented 3 years ago

@glyn wrote:

For me, regarding a JSON value as a tree isn't an implementation consideration - it's an observation about the abstract structure of the data. Similarly, the notion of the position (or perhaps location) of a node in the tree is just an abstract way of identifying where the node is in the tree.

Okay. So we want to talk about abstract input, abstract output. Tree of nodes in, tree of nodes out. It would be really helpful if you could answer two questions.

Question 1: What is a node in JSONPath? What are its properties?

"node" appears 60 times in the draft, but I only found two mentions of what a node is. In section 3.2, it says "Each node holds a JSON value" (emphasis added). And then there is "root node which is the input document" (emphasis added.)

I'm assuming (please correct me if I'm wrong), that a JSON value as defined in RFC8259 cannot itself be considered a "tree of nodes". It's a hierarchy of values, not nodes that contain values.

What are the properties of a node? We're told it holds a JSON value. What else does it hold?

Question 2: How do we interpret "root node which is the input document"

The term "input document", is that a real thing? Actual JSON data? No nodes?

Does the draft really mean that the root node is the input document? Or that it represents the input document?

Suppose the input document is

[{"foo":1, "bar":2}]

What value does the root node contain? [{"foo":1, "bar":2}]?

Apologies for questions that must seem self evident to you, but I don't have any experience with nodes in the context of JSON.

Thanks, Daniel

danielaparker commented 3 years ago

The draft defines Position, In section 1.1, as follows:

"A JSON data item identical to or nested within the JSON data item to which the query is applied to, expressed either by the value of that data item or by providing a Normalized Path Expression as a JSONPath Output Path."

I think this needs to be clearer.

It would be easier to read if the phrase "the JSON data item to which the query is applied to" was replaced by a single term defined earlier, perhaps root, or whatever term the authors choose to use consistently that means the same thing.

The statement that position can be expressed "by the value of that data item" is not clear. What is its intended meaning? A "data item" is said earlier to be the the same as a "JSON value". I don't understand how "position" can be expressed as the value of that item. Is it its location within the data item? Particularly when the alternative is "a Normalized Path Expression as a JSONPath Output Path".

I think it would be easier to read the sentence if "Normalized Path Expression as a JSONPath Output Path" was replaced by "Output Path", which is defined earlier.

goessner commented 3 years ago

Question 1: What is a node in JSONPath? What are its properties?

"node" appears 60 times in the draft, but I only found two mentions of what a node is. In section 3.2, it says "Each node holds a JSON value" (emphasis added). And then there is "root node which is the input document" (emphasis added.) ... Question 2: How do we interpret "root node which is the input document"

I agree, "node" is an important term to define clearly. Pragmatically quoting Wikipedia as not being a computer scientist gives:

A node is a structure which may contain a value or condition, or represent a separate data structure (which could be a tree of its own).

With this, for me "node" is either a primitive JSON value or a JSON container – array or object – as a possibly empty subtree. Nodes are either named or indexed; thus have a unique tree location. Leaf nodes are either primitive values or empty containers. There is always a root node.

What do you think – I'm not an expert with "abstract data types".

cabo commented 3 years ago

A node is an item (a term we have defined), with an emphasis on this item possibly being a specific part of a bigger item. (Add same instance vs. equal value discussion here.)

goessner commented 3 years ago

Oh well: "(Data) item" == "JSON value" == "node". Maybe we should make that triple equivalence explicit.

danielaparker commented 3 years ago

Oh well: "(Data) item" == "JSON value" == "node". Maybe we should make that triple equivalence explicit.

Data item and JSON value, as defined in the draft, are crystal clear.

But it isn't stated in the text of the draft that "node" == "JSON value". It's stated that "Each node holds a JSON value", and "the JSON value held by a node " (emphasis added). My reading of that text is that a node boxes a JSON value ( and possibly has additional attributes, such as location information if the JSON value is a descendent of the root.)

Also consider that a reader seeing the term "node" may have a priori understanding of the term by analogy to DOM nodes, where it's possible to navigate back to the root from any child node.

Note that my concern is with what the text says, what the authors intended, and the alignment thereof. I'm less concerned with any particular resolution.

goessner commented 3 years ago

So we now have (bold terms added):

Term Description
Data Item A structure complying to the generic data model of JSON, i.e., composed of containers, namely JSON objects and arrays, and of atomic data, namely null, true, false, numbers, and text strings. Also called a JSON value.
Object A JSON object as defined in {{-json}} Never used in its generic sense, e.g., for programming language objects.
Array A JSON array as defined in {{-json}} Never used in its generic sense, e.g., for programming language arrays.
Member A name/value pair in a JSON object. (Not itself a JSON value.)
Name The name in a name/value pair constituting a member. (Also known as "key", "tag", or "label".)
Element An item in an array. (Also used with a distinct meaning in XML context for XML elements.)
Query Short name for JSONPath expression.
Argument Short name for the JSON data item a JSONPath expression is applied to.
Output Path A simple form of JSONPath expression that identifies a Position by providing a query that results in exactly that position. Similar to, but syntactically different from, a JSON Pointer {{-pointer}}.
Position A JSON data item identical to or nested within the JSON data item to which the query is applied to, expressed either by the value of that data item or by providing a Normalized Path Expression as a JSONPath Output Path.
Normalized Path Expression A query in a normalized form that identifies exactly one Position in an Argument; see {{overview}}

Living table ... Please edit.

goessner commented 3 years ago

Why not simply Normalized Path instead of Normalized Path Expression, since an Output Path is always a Normalized Path ?

danielaparker commented 3 years ago

@goessner wrote:

With this, for me "node" is either a primitive JSON value or a JSON container – array or object – as a possibly empty subtree. Nodes are either named or indexed; thus have a unique tree location. Leaf nodes are either primitive values or empty containers. There is always a root node.

What do you think – I'm not an expert with "abstract data types".

What my question really comes down to, considering both "JSON value" and "node" as abstract entities, is this:

Is this abstract operation defined in one or both cases:

(1) parent(JSON-value)

(2) parent(node)

For (1), my expectation is "no", since JSON values are generally regarded in JSON specifications as self-contained. Using RFC 6901 terminology, a "specific value" extracted from a "JSON document" has no connection anymore to the original "JSON document".

For (2), my expectation is "yes", by analogy to the DOM and DOM nodes.

More generally, are there any abstract operations defined on "node" that are not also defined on "JSON value"?

If the answer is no, than introducing the term "node" serves no purpose.

cabo commented 3 years ago

I don't think we should use "JSON value"; we have the term "item" for that. "node" means exactly the same, so we don't need it.
In the same way that you don't need both the term "flophouse" and "hotel". I believe "node" is a good term when the emphasis is on the position that item has on a tree.

glyn commented 3 years ago

@glyn wrote:

For me, regarding a JSON value as a tree isn't an implementation consideration - it's an observation about the abstract structure of the data. Similarly, the notion of the position (or perhaps location) of a node in the tree is just an abstract way of identifying where the node is in the tree.

Okay. So we want to talk about abstract input, abstract output. Tree of nodes in, tree of nodes out.

For the whole selector, the input is a JSON value which can be thought of abstractly as a tree of nodes, but the output is a sequence of JSON values each of which can be thought of as a subtree of the input tree.

It would be really helpful if you could answer two questions.

Question 1: What is a node in JSONPath? What are its properties?

If you're not familiar with trees and nodes, Wikipedia gives a reasonable definition of [trees](https://en.wikipedia.org/wiki/Tree_(data_structure) - see the terminology section for "node". However, that page is about data structures, so is not particularly abstract. The page on Tree structure may be more helpful.

"node" appears 60 times in the draft, but I only found two mentions of what a node is. In section 3.2, it says "Each node holds a JSON value" (emphasis added). And then there is "root node which is the input document" (emphasis added.)

I'm assuming (please correct me if I'm wrong), that a JSON value as defined in RFC8259 cannot itself be considered a "tree of nodes". It's a hierarchy of values, not nodes that contain values.

A JSON value can be considered as a tree of nodes. A JSON literal corresponds to a single node with value equal to the literal. A sequence corresponds to a node whose children (indexed by position in the sequence) correspond to the values of the sequence. An object corresponds to a node whose children (indexed by name in the object) correspond to the values of the object.

What are the properties of a node? We're told it holds a JSON value. What else does it hold?

I think it's probably better to talk of a JSON value corresponding to the tree rooted in a node.

For literals, the node holds the type and value of the literal. For sequences or objects, the node holds just the number of children or their names, respectively.

* Does a node provide access to its parent node, like a DOM Node does?

Every node except the root node has a parent. I don't think it "provides access" to anything, because this is an abstraction rather than a programming construct.

* Does a node provide access to its associated key, or index?

No, a node doesn't know its key or index - that's the concern of its parent.

Question 2: How do we interpret "root node which is the input document"

The term "input document", is that a real thing? Actual JSON data? No nodes?

The input document is JSON data and is also a JSON value. It corresponds to a tree of nodes as outlined above.

Does the draft really mean that the root node is the input document? Or that it represents the input document?

Probably "represents" is closest. I used the word "corresponds" above because I have in mind a mapping from the JSON value to a tree of nodes.

Suppose the input document is

[{"foo":1, "bar":2}]

What value does the root node contain? [{"foo":1, "bar":2}]?

The tree in this case is something like this:

                                   sequence
                                      | 0
                                    object
                             foo  /      \  bar
                               1           2

Apologies for questions that must seem self evident to you, but I don't have any experience with nodes in the context of JSON.

Thanks, Daniel

goessner commented 3 years ago

I don't think we should use "JSON value"; we have the term "item" for that.

Is this a hint to rename "data item" to "item"?

"node" means exactly the same, so we don't need it. In the same way that you don't need both the term "flophouse" and "hotel". I believe "node" is a good term when the emphasis is on the position that item has on a tree.

ok ... "node" is self-explaining in the tree context, so I will delete it from the terms list.

cabo commented 3 years ago

On 2021-03-11, at 17:41, Stefan Goessner @.***> wrote:

I don't think we should use "JSON value"; we have the term "item" for that.

Is this a hint to rename "data item" to "item"?

Data item is just the long form.

"node" means exactly the same, so we don't need it. In the same way that you don't need both the term "flophouse" and "hotel". I believe "node" is a good term when the emphasis is on the position that item has on a tree.

ok ... "node" is self-explaining in the tree context, so I will delete it from the terms list.

I still think we should have a terminology entry (or add it to the “item” entry).

Grüße, Carsten

danielaparker commented 3 years ago

@glyn wrote:

* Does a node provide access to its parent node, like a DOM Node does?

Every node except the root node has a parent. I don't think it "provides access" to anything, because this is an abstraction rather than a programming construct.

Just to note, abstractions are defined by their properties. It's meaningless to talk about an abstraction without enumerating their properties, much as it would be meaningless to talk about an Abelian Group or a Hilbert Space without enumerating their properties. So I think asking whether a node supports a property "parent" is meaningful. My own view is that it would be helpful to define "node" succinctly in the terminology section, with its properties briefly mentioned. Then those properties can be referred to later in the draft when specifying how to go from a node to a Normalized Path or an item.

I don't think its necessary to talk about a "tree of nodes", which is a less abstract concept; it is enough to define node, and identify its properties. "tree of nodes" suggests an arrangement of data that is never found in actual implementations, "node" by itself doesn't have that connotation. It's properties are all that matter.

glyn commented 3 years ago

I don't think it's necessary for a node to "know" about its parent.

I personally think it's easier to talk about a tree of nodes rather than a node in isolation. The tree structure is implicit in the way JSON is structured.

Anyway, I feel we are talking somewhat at cross-purposes, but do you think we are converging on a common understanding yet?

gregsdennis commented 3 years ago

The implementations I know about don't have a tree, they don't identify location by the position in a tree - @danielaparker

This is precisely how every .net solution I've seen/created works.

Also, seconding everything that @glyn said. (I was building such a response, but I upon reading it, I figured the horse was dead enough.)

Regarding access to parents, I'd like to call out that the Newtonsoft parser results in a tree of items that do have such access, while the newer in-built System.Text.Json parser results in a tree of items that do not have access.

NOTE These are represented by .Net classes internally, so naturally they have some extended properties besides the JSON value. The point here is that one points back to the parent while the other doesn't. Therefore it's completely up to the implementation/parser to determine how it wants to model the JSON internally. JSON doesn't define how or whether such access should be supported. It's just a data structure.

Additionally, @handrews (he's on sabbatical, so I don't want to tag him) has released Relative JSON Pointers as an extension to the base JSON Pointers spec where the first action is to navigate up the tree before dereferencing the rest of the pointer. This spec assumes nothing of "nodes" that contain any metadata or their navigability. It's just considering the structure as a tree of values.

timbray commented 3 years ago

I just read the current draft carefully and was left unconvinced that the term "node" needs to be used. In particular, consider the nice example in section 3.5. I went through that and it seems to me that every instance of "node" could be replaced by "JSON Value" and be accurate, because I think every bit of the instance discussed there is a JSON value.

(If we were going to do this, we should probably have a terminology note that "Value" means "JSON Value", just for brevity and readability.)

Are there other places in the spec where the use of the term "node" is necessary for accuracy or clarity?

To test my hypothesis, I'll create a PR that updates section 3.5 to (hopefully) support my point. But if someone has a strong argument why we need to invent new terminology, I'd be happy to avoid the effort.

cabo commented 3 years ago

There is no doubt that the draft could be made to work by calling the things "mugs". One can adapt to anything.

Value is a particularly bad term to use here because it focuses on the valueness, i.e., comparison by value. Nodes focus on the place, the position, in the tree, i.e., comparison by identity. Again, the draft can be written in a way that makes the reader make that distinction on their own, but that is not a good way to do tech doc.

Why don't the terms from 8259 suffice? 8295 defines a JSON text, not its processing. It is rather natural that once processing is added, new terms are added, too.

cabo commented 3 years ago

(Of course, I don't have a strong argument for why we need to "invent" "new" terminology; I think the term "node" has been used before. Jeez.)

danielaparker commented 3 years ago

@cabo wrote:

There is no doubt that the draft could be made to work by calling the things "mugs". One can adapt to anything.

Value is a particularly bad term to use here because it focuses on the valueness, i.e., comparison by value. Nodes focus on the place, the position, in the tree, i.e., comparison by identity. Again, the draft can be written in a way that makes the reader make that distinction on their own, but that is not a good way to do tech doc.

Consider the evaluation of a JSON document that produces a JSON array of JSON values,

JSON = jsonPath(JSON, JSONPath-expression, value-option)

This is the most common case, and the entirety of the cburgmer JSONPath Comparisons is devoted to it.

I believe an implementation could match all of the consensus results, and where no consensus exists, the largest minority result, with doing nothing more than starting with a JSON value and working down through its children, without accumulating paths, and without concerning itself with identity. I note that the consensus is duplicates allowed (32 out of 41 implementations); with duplicates, there are issues with identity, but without, there are not.

Now consider the evaluation of a JSON document that produces a JSON array of Normalized Paths,

JSON = jsonPath(JSON, JSONPath-expression, path-option)

In terms of the processing model, in the case of duplicates allowed, not much is different, except that in addition to selecting a value at each step, a corresponding path needs to be accumulated. The paths are simply accumulated, we don't do anything with them, we compute them along with the values, and we return a JSON array of them. At every step, the "current value" corresponds to an accumulated path, these two exist as a pair, so I think it makes sense to refer to that pair as a "node".

In the output, a path will be represented as a "string". In the processing model, it need not be, we can talk about a "context position" if you like, but conceptually it's a path.

For either the value or path option, it's in the minority position of "no duplicates" that things become interesting, where identity matters. But identity can be determined entirely by paths. Duplicate path-value pairs are straight forward to exclude.

Of course, other processing models that result in no duplicates are interesting to look at also.

Why don't the terms from 8259 suffice? 8295 defines a JSON text, not its processing. It is rather natural that once processing is added, new terms are added, too.

Yes, but it seems to me that the items upon which the processing model operate should be from 8259. New terms shouldn't be introduced gratuitously.

Daniel

cabo commented 3 years ago

Yes, but it seems to me that the items upon which the processing model operate should be from 8259.

So why are you talking about JSON documents, paths, children, etc.?

New terms shouldn't be introduced gratuitously.

I completely agree on this one, but I think you made a great demonstration that a few new terms do help.

timbray commented 3 years ago

On Mon, Mar 22, 2021 at 1:06 PM cabo @.***> wrote:

There is no doubt that the draft could be made to work by calling the things "mugs". One can adapt to anything.

Value is a particularly bad term to use here because it focuses on the valueness, i.e., comparison by value. Nodes focus on the place, the position, in the tree, i.e., comparison by identity. Again, the draft can be written in a way that makes the reader make that distinction on their own, but that is not a good way to do tech doc.

Well, you could call them mugs or nodes, but what they actually are is JSON Values, as defined in 8259. I'm having trouble parsing the phrase "the place, the position, in the tree, i.e. comparison by identity".

I'll go ahead and do a PR to update 3.5 without introducing a new term and let's see if we like how it reads.

danielaparker commented 3 years ago

@cabo wrote:

Yes, but it seems to me that the items upon which the processing model operate should be from 8259.

So why are you talking about JSON documents, paths, children, etc.?

Point taken :-) One does feel the need to call that first thing provided something other than a JSON value, JSONPointer uses "JSON document", JSON Schema uses "JSON instance", Goessner uses "root". I think paths are required. "Children" can be dispensed with.

cabo commented 3 years ago

72 uses "argument" for the argument of the query.

danielaparker commented 3 years ago

@cabo wrote:

I completely agree on this one, but I think you made a great demonstration that a few new terms do help.

Incidentally, specific language issues aside, if you're thinking about processing models for preserving identity and removing duplicates, which is the impression I got from watching the last meeting video, I'm actually quite interested in that.

Daniel

timbray commented 3 years ago

On Mon, Mar 22, 2021 at 2:48 PM Daniel Parker @.***> wrote:

Yes, but it seems to me that the items upon which the processing model operate should be from 8259.

So why are you talking about JSON documents, paths, children, etc.?

Point taken :-) One does feel the need to call that first thing provided something other than a JSON value, JSONPointer uses "JSON document", JSON Schema uses "JSON instance", Goessner uses "root". I think paths are required. "Children" can be dispensed with.

In 8259, the top-level production is " ws value ws ". Anyone who's read that RFC is not going to have any trouble with the phrase "root value". 8259 does not define or use the term "child", so if we do need it (not obvious to me) we can define it as meaning "elements, if the value is an array, member values if it's an object".

I note that the current draft does not actually define the term "node". I think that if you did, the definition would be something like "synonym of JSON value". So…

cabo commented 3 years ago

Please read #72 at some point.

danielaparker commented 3 years ago

@timbray wrote:

I note that the current draft does not actually define the term "node".

Right

I think that if you did, the definition would be something like "synonym of JSON value". So…

Still, when talking about "@", I prefer to talk about it as a "current node" rather than a "current value". "current value" isn't enough for fulling describing the JSONPath processing model, when talking about producing Normalized Paths, the path associated with the current value matters too. It matters even more when we talk about no duplicates. I think @cabo is right to emphasize that aspect. Talking about @ as representing both the location in the root value as well as the value itself, and using node to mean that, seems reasonable to me.

glyn commented 3 years ago

I note that the current draft does not actually define the term "node". I think that if you did, the definition would be something like "synonym of JSON value".

I think it makes more sense to talk about the location of a node rather than the location of a JSON value because a JSON value could be present at multiple nodes whereas a node has a unique location.

goessner commented 3 years ago

@glyn: thanks for bringing it to the point pragmatically. In

{a:{c:5},b:{d:5}}

we have five JSON values, from which two values are 5, i.e. identical. They belong to two different name/value pairs (members) according to 8259. Now consider

[5,5]

with three JSON values, from which again two values are 5, i.e. identical. They belong to two different ... hmm ... 8259 does not tell. But JSONPath needs to identify them uniquely by their location.

Looking at JSON Pointer 6901 we read

Evaluation of a JSON Pointer begins with a reference to the root
value of a JSON document and completes with a reference to some value
within the document. Each reference token in the JSON Pointer is
evaluated sequentially.

So JSON Pointer also needs to identify things uniquely by their location and uses ...

... for addressing members in objects and elements in arrays by a single term, since value alone simply does not work.

For us in fact two additional – location specifying – terms would suffice

or skipping root, then only one.

cabo commented 3 years ago

JSON pointer is about references, so they might have that word in their terminology; I don't think we need that. node and item are in #72. root is interesting, because that is an argument to the query — I called that argument in #72. We can always change these terms more if we want to, but at some point we need to enable ourselves to work on the content of the draft.

danielaparker commented 3 years ago

@glyn wrote:

I note that the current draft does not actually define the term "node". I think that if you did, the definition would be something like "synonym of JSON value".

I think it makes more sense to talk about the location of a node rather than the location of a JSON value because a JSON value could be present at multiple nodes whereas a node has a unique location.

But it only makes sense to talk about the location of a "node" if the term "node" is defined, which I don't believe it is in the current draft. This is what "node" means in XPath 3.1, but what does it mean in JSONPath? It is meaningless to talk about "node" unless "node" is defined, and its properties enumerated, as they are in the XPath 3.1 reference. I think terms are being introduced into the draft that are suggestive of terms in XPath, such as "item" and "node", but surely the one thing that we can all agree on is that we do not want the JSONPath spec to look anything like XPath, with its horrendously complicated data model.

It's also unnecessary to introduce "node" to fully describe JSONPath, including selection of values or references from a root value with duplicates, selection of Normalized Paths from a root value with duplicates, and both of these operations without duplicates (when identity matters.) It is enough to have the notion of path/value pairs, where "path" describes the location of the value in the root value. Nothing else is needed.

Daniel

goessner commented 3 years ago

@cabo ... this was merely an answer for Tim's wish, restricting to 8259 terms exclusively. @danielaparker ... so you are proposing to use root value and value location?

cabo commented 3 years ago

@goessner — yes, I explained why limiting ourselves to 8259 terminology is misguided.

We can certainly invent more and more complex terminology, or we can accept that "node" is the term of art here.

danielaparker commented 3 years ago

@goessner wrote:

@danielaparker ... so you are proposing to use root value and value location?

@timbray proposed root value here, so I'll stick to that term in my commentary, unless another term is agreed to. I'm broadly in favour of Tim's wish to adhere to 8259 terms to the extent possible.

I'm happy with "value location", since it seems to be nicely symmetrical with the other terms. I believe all operations in the JSONPath processing model can be fully described with just the notions of "root value", "value" and "value location".

Daniel

goessner commented 3 years ago

Table: Terminology compared

JSONPath XPath CSS Selectors JSON Pointer JMESPath
Title JSONPath: Query expressions for JSON XML Path Language Selectors JavaScript Object Notation (JSON) Pointer A query language for JSON
What is it? string syntax for identifying values within a JSON document an expression language that allows the processing of values conforming to the data model defined in XDM 3.0 patterns that match against elements in a tree string syntax for identifying a specific value within a JSON document A query language for JSON
primary purpose selection selection,
addressing,
matching,
manipulating
matching identifying selection, matching, manipulating
document root root value context node document root root value JSON value
models root as ... root value tree of nodes tree of elements JSON document? JSON value
primary syntactic construct expression,
value location
expression,
location path
selector JSON Pointer expression
minimal selector unit selector (?) location step simple selector reference token index-expression, identifier, sub-expression
minimal target JSON value,
node
node, value element referenced value JSON value
expression evaluates to list of nodes,
array of values,
array of output pathes
node-set elements JSON value JSON value
what to do on errors return empty node list ? errors specified ... but no according behavior rule using selector is dropped not defined by specification Raise syntax, wrong type, and arity errors detected during evaluation

Comparison table update from #53. Thanks @danielaparker for adding JMESPath column.

danielaparker commented 3 years ago

@cabo Regrading the table with terminology compared, not sure where I can comment on this, as the issue has been closed, and I don't see it in a pull request.

"Sequences replace node-sets from XPath 1.0. In XPath 1.0, node-sets do not contain duplicates. In generalizing node-sets to sequences in XPath 3.0 and XPath 3.1, duplicate removal is provided by functions on node sequences."

cabo commented 3 years ago

Hi Daniel,

This way of reporting things is fine, but even better is raising an issue -- yes, even small editorial fixes are best handled as issues (you may want to label them "editorial").

Re XPath versions, I'm not sure how much we need to dwell on this in this document. I think I will try to come up with a pull request that moves some of this material into an appendix as we said, but I think I'll focus on #79 for a couple of days.