Motivate or remove Normalized Paths

timbray commented 1 year ago

Why do they exist? The spec says nothing on the subject. If we don't have something convincing to say, why not remove them? I know we discussed this before, but as I was reading this with fresh eyes I was thinking "There's a lot of stuff here but why bother?"

gregsdennis commented 1 year ago

The normalized path was to be used in the output. I don't think there was any other use for them.

We wanted to ensure that the output was consistent (primarily for testing/comparison purposes), so we devised the "normalized path" to ensure that paths were output in a known format.

After the refactor, it basically just means that the path doesn't use the shorthand notations; all of the segments are bracketed.

glyn commented 1 year ago

Why do they exist? The spec says nothing on the subject. If we don't have something convincing to say, why not remove them? I know we discussed this before, but as I was reading this with fresh eyes I was thinking "There's a lot of stuff here but why bother?"

The following sentence in the Normalized Paths section provides some motivation:

A JSONPath implementation may output Normalized Paths instead of, or in addition to, the values identified by these paths.

but maybe we need to expand it along the lines of @gregsdennis's comment above.

glyn commented 1 year ago

The normalized path was to be used in the output. I don't think there was any other use for them.

We wanted to ensure that the output was consistent (primarily for testing/comparison purposes), so we devised the "normalized path" to ensure that paths were output in a known format.

Yes, that's my recollection.

After the refactor, it basically just means that the path doesn't use the shorthand notations; all of the segments are bracketed.

Yes, Normalized Paths use only bracket notation, but bracket notation can also produce paths which are not Normalized, such as $['book'][?(@.price<10)] and $[-3].

timbray commented 1 year ago

I repeat my basic question: What are these things for and what benefit do they offer, and to whom? I'm not sure what "used in the output" even means. It seems to me that more motivation is needed. I'd also be OK with just removing the whole section.

glyn commented 1 year ago

I repeat my basic question: What are these things for and what benefit do they offer, and to whom? I'm not sure what "used in the output" even means. It seems to me that more motivation is needed. I'd also be OK with just removing the whole section.

Here goes...

A Normalized Path plays the same role as a JSON Pointer in situations where JSONPath syntax is required instead of JSON Pointer. A Normalized Path identifies a node in a given JSON value.

Given a JSONPath expression which identifies a particular node in a given JSON value, there is a unique Normalized Path which identifies the same node in the given JSON value.

Software (applications, libraries, JSONPath implementations, etc.) which outputs Singular Paths identifying nodes in a JSON value can output the equivalent Normalized Paths instead. A benefit of doing this is that the behaviour of the software is then more predictable and more easily tested (because the tests need to accommodate fewer possibilities). If there are multiple implementations of the software, sharing a single test suite is made more tractable.

Who does this benefit? Human or programmatic users of the software observe more predictable behaviour. People writing tests have less work to do. Sharing a test suite avoids people duplicating effort.

danielaparker commented 1 year ago

Normalized paths are introduced in Stefan Goessners's original JSONPath article, and are implemented in the two most important implementations, Goessner JavaScript JSONPath, and Jayway Java JSONPath. I assume Stefan Goessner can speak to the original motivation, and if you want to know how they are used in practice, I suggest surveying users, perhaps starting on the Jayway site. Among other uses, they are helpful for diagnostics, for discovering what a perhaps non-obvious query resolves to in terms of the individual nodes selected. They have other uses for selecting specific nodes given a much larger query.

timbray commented 1 year ago

I think the spec needs a statement with a MUST/SHOULD/MAY about supporting these things and a clear suggestion as to what useful purpose they're designed to serve.

Otherwise I seriously suggest removing them because to be honest I can't think of a single scenario where I'd use them.

gregsdennis commented 1 year ago

I can't think of a single scenario where I'd use them

I think they're intended not to be written but consumed.

I agree in that I doubt anyone would write out $['a']['b'][1] except maybe in the context of a stackoverflow question or something. They'd just use $.a.b[1].

The point of this, though, is to be able to declare a preferred syntax that implementations should use to identify a known location in a document. It's just saying that we prefer $['a']['b'][1] over $.a.b[1]. (One advantage of using the bracketed syntax for normalization is that any location in a document can be written in this syntax, whereas there are restrictions on the dot syntax.)

Again, I think this is most apparent in output when the implementation is returning paths to locations instead of / along with the values.

glyn commented 1 year ago

Another consideration is the spec's definition of nodelist as the output of applying a query:

Nodelist: A list of nodes. The output of applying a query to an argument is manifested as a list of nodes. While this list can be represented in JSON, e.g. as an array, the nodelist is an abstract concept unrelated to JSON values.

The following statement in the Normalized Paths section:

A JSONPath implementation may output Normalized Paths instead of, or in addition to, the values identified by these paths.

steps into implementation detail. In the rest of the spec we studiously avoid prescribing implementation details.

One way to reconcile these would be to put some non-normative text near the start of the spec which talks about implementation options for nodelist. This could then include language such as "If a nodelist is represented as a list JSONPath expressions, these SHOULD be Normalized Paths.". (But would it make sense to use "SHOULD" in non-normative text??)

gregsdennis commented 1 year ago

steps into implementation detail

I don't think that's implementation detail. It's defining what the output may be.

We can talk about this in another issue if it concerns you, though.

glyn commented 1 year ago

I would say it's a statement about the representation of the abstract nodelist. A list of Normalized Paths is a string representation. Other possible representations are lists of JSON values in string form, a programming language list of nodes, etc. The choice of representation is an implementation decision. Feel free to raise a separate issue if you'd like to change the current text.

goessner commented 1 year ago

Normalized Paths ...

are a concrete string presentation of nodelists (JSON text).
are already used in the spec within every example table (Result Paths).
are meant as an additional / alternative output of a JSONPath query.
enhance predictability and testing.
might (will) be used as input for subsequent analysis / actions (removing of duplicates?). Think of piping in Unix.

Especially with the last point standardizing them is essential.

timbray commented 1 year ago

Could we include a bit of Stefan's argument somewhere in the introduction to the section?

glyn commented 1 year ago

Done - see https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/pull/289.

ietf-wg-jsonpath / draft-ietf-jsonpath-base

Motivate or remove Normalized Paths #276