ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
59 stars 20 forks source link

Motivate Normalized Paths #289

Closed glyn closed 1 year ago

goessner commented 1 year ago

I welcome this ... thanks.

sthagen commented 1 year ago

From the side line: Do we need to add description of behavior (with a may qualifier) to motivate the truly welcome attributes of the Normalized Path?

glyn commented 1 year ago

From the side line: Do we need to add description of behavior (with a may qualifier) to motivate the truly welcome attributes of the Normalized Path?

A Normalized Path is just a particular kind of JSONPath query. Its only behaviour is to identify a node within the tree of nodes in a value. Or did you have some other behaviour in mind?

timbray commented 1 year ago

A Normalized Path is just a particular kind of JSONPath query

Oh, of course, but that formulation had never occurred to me, and it's useful. If it's not in the spec, it should be.

sthagen commented 1 year ago

From the side line: Do we need to add description of behavior (with a may qualifier) to motivate the truly welcome attributes of the Normalized Path?

A Normalized Path is just a particular kind of JSONPath query. Its only behaviour is to identify a node within the tree of nodes in a value. Or did you have some other behaviour in mind?

I meant the first of the two Consequently, ... sentences:

Consequently, a JSONPath implementation may output Normalized Paths instead of, or in addition to, the values identified by these paths.

Here we describe the behavior of an implementation (output as response to an unspecified request that triggers the former).

I just wonder if this section is a good place to describe a part of a protocol, when the topics are a) the "projection" itself and b) potential benefits (all from a data perspective).

cabo commented 1 year ago

Here we describe the behavior of an implementation (output as response to an unspecified request that triggers the former).

Well, we describe the nature of what it is to implement JSONPath. Generally, in a protocol, you not only need to prescribe what is on the wire, but also what it means. That is only ever in the implementation. That doesn't mean that it describes the behavior of an implementation.

E.g., in TCP, the byte stream that is transported by the protocol only ever happens in the implementation. Nothing on the wire reflects this byte stream. But describing that byte stream as the underlying model of TCP is rightly not criticized as "describing the behavior of an implementation".

glyn commented 1 year ago

From the side line: Do we need to add description of behavior (with a may qualifier) to motivate the truly welcome attributes of the Normalized Path?

The penny has just dropped: you are questioning the need for such a description rather than asking for more of the same! I agree with @cabo's response.

gregsdennis commented 1 year ago

I think an important aspect that this is missing is that normalized paths are contexted to the JSON data that was queried; whereas singular paths exist independently, as does any other path.

For example, the path $[-3] (used in the text) is singular and can rightfully exist on its own. However it can't be normalized without querying JSON data. Once queried, the resulting normalized paths can be different based on the data.

gregsdennis commented 1 year ago

Consequently, a JSONPath implementation may output Normalized Paths instead of, or in addition to, the values identified by these paths.

Here we describe the behavior of an implementation (output as response to an unspecified request that triggers the former).

This text isn't getting into implementation behavior at all. It's specifying output and allowing for options.

sthagen commented 1 year ago

This text isn't getting into implementation behavior at all. It's specifying output and allowing for options.

Well, unforced ambiguous behavior then. Specifying output and allowing for (unrequested) options.

For instance when the SomeTool processor outputs a number 42 and the options are that it represents any base in which the symbols are generally valid (per consensus) to me would rather raise questions than answer them. I would expect these complication (extra dimensions) less inside a format but more in a "processor" role or protocol description. The latter describing expected behavior within the dynamics of some interchange.

I used the term implementation without caution. Sorry for that. In my reading of the sentence we discuss about, the consumer of such a result has to look elsewhere to find out what the processor provides as result, right?

That is our normal task as tool users.

However, I think we better try to not bring in such ambiguities without noting the reason outside of our mandate to change. Maybe note, that existing processors (or however we like to call these providers of jsonpath responses) display mixed behaviors (what you call options).

Makes sense?

If not, sorry for the noise.

cabo commented 1 year ago

I think an important aspect that this is missing is that normalized paths are contexted to the JSON data that was queried; whereas singular paths exist independently, as does any other path.

Well, we do say:

Note that a Normalized Path represents the identity of a node _in a specific value_.

I wouldn't go so far to say that the Normalized Path is "contexted". You need the value to compute the Normalized Path from the query, but after that, it doesn't require context to apply it (well, they are meant to be applied to that value, but they are regular JSONPath queries).

With IETF 115 raging and Glyn being offline for a couple of days, we'll probably not respond very fast the next few days; I think good points are being made here even it is not always clear how to modify the text further to fully reflect them.

glyn commented 1 year ago

I think an important aspect that this is missing is that normalized paths are contexted to the JSON data that was queried; whereas singular paths exist independently, as does any other path.

For example, the path $[-3] (used in the text) is singular and can rightfully exist on its own. However it can't be normalized without querying JSON data. Once queried, the resulting normalized paths can be different based on the data.

* `[ 0, 1, 2, 3, 4 ]` yields a normalized path of `$[2]` for the value `2`

* `[ 0, 1, 2, 3, 4, 5, 6 ]` yields a normalized path of `$[4]` for the value `4`.

I believe the PR now makes this property of Normalized Paths clear. (I'd prefer not to labour the example.)

glyn commented 1 year ago

This text isn't getting into implementation behavior at all. It's specifying output and allowing for options.

Well, unforced ambiguous behavior then. Specifying output and allowing for (unrequested) options.

For instance when the SomeTool processor outputs a number 42 and the options are that it represents any base in which the symbols are generally valid (per consensus) to me would rather raise questions than answer them. I would expect these complication (extra dimensions) less inside a format but more in a "processor" role or protocol description. The latter describing expected behavior within the dynamics of some interchange.

I used the term implementation without caution. Sorry for that. In my reading of the sentence we discuss about, the consumer of such a result has to look elsewhere to find out what the processor provides as result, right?

That is our normal task as tool users.

However, I think we better try to not bring in such ambiguities without noting the reason outside of our mandate to change. Maybe note, that existing processors (or however we like to call these providers of jsonpath responses) display mixed behaviors (what you call options).

Makes sense?

If not, sorry for the noise.

@sthagen I have attempted to address your concern by making the following general statement (suggested by @cabo):

A canonical representation of a nodelist is as a JSON arrays of strings, where the strings are Normalized Paths.

A nodelist is defined as follows (in the Terminology section):

A list of nodes. The output of applying a query to an argument is manifested as a list of nodes. While this list can be represented in JSON, e.g. as an array, this specification does not require or assume any particular representation.

From this, it is clear that an implementation of JSONPath may output a nodelist as a JSON array of strings, where the strings are Normalized Paths. So I have deleted the following statements, which seemed to be causing confusion:

A JSONPath implementation may output Normalized Paths instead of, or in addition to, the values identified by these paths. For example, the JSONPath expression $.book[?(@.price<10)] could select two values and an implementation could output the Normalized Paths $['book'][3] and $['book'][5].

(I'm not sure whether this will be acceptable to the other reviewers, so I'm going to leave the PR open until they have had a chance to review the latest version.)

danielaparker commented 1 year ago

I think an important aspect that this is missing is that normalized paths are contexted to the JSON data that was queried; whereas singular paths exist independently, as does any other path. For example, the path $[-3] (used in the text) is singular and can rightfully exist on its own. However it can't be normalized without querying JSON data.

Normalizing paths? What does that mean? There is no concept of "normalizing" arbitrary paths, it's pointless, there is no suggestion of that in Goessner's original article, or anywhere else that I'm aware of.

Once queried, the resulting normalized paths can be different based on the data.

* `[ 0, 1, 2, 3, 4 ]` yields a normalized path of `$[2]` for the value `2`

* `[ 0, 1, 2, 3, 4, 5, 6 ]` yields a normalized path of `$[4]` for the value `4`.

Normalized paths are generally produced from a specific JSON value, but they can be applied to other JSON values, and in fact this is not an uncommon use case. Much like a JSONPointer, except a JSONPointer cannot be fully interpreted without reference to the data, an integer appearing in a JSONPointer could be an array index or it could be an object key, whereas a normalized JSONPath, once produced, stands on its own. Think about a JSON array with a gigabyte worth of more or less heterogenous elements. It's not uncommon to produce normalized paths for the first element, and then apply selected ones to all elements.

glyn commented 1 year ago

@cabo Not sure if you've seen the final form of this PR, but I'm going to merge it to close off the discussion threads. We can adjust the section later if necessary.