ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
59 stars 20 forks source link

IETF Fwd: Some Comments ... #54

Closed goessner closed 3 years ago

goessner commented 3 years ago

Hello List,

It has been important to go through this list threads carefully. In fact I should have done that at first. Now I can understand the current draft and appreciate the work already done much better.

I collected some citations (important from my point of view) with comments already in Markdown.

Title of the specification

JSONPath: A query language for JSON data. (Carsten Bormann)

I think I’d slightly prefer the term “syntax” to “language” because “query language” has a smell of various things that end with the letters “Q” and “L”. But not passionate about that. (Tim Bray)

JSONPath: A query syntax for JSON. Another wild-card idea: JSONPath: Query expressions for jSON (Tim Bray)

The beauty of this is that the plural form “query expressions” implies a set of expressions, so it implies “language”. It’s indeed more than the grammar/syntax of those, so why not talk about the expressions as a whole. This also makes it possible to just use “for JSON”, without going into detail what these query expressions operate on. (Carsten Bormann)

There seems to be an agreement for "JSONPath: Query expressions for JSON". I like that also.

Terminology

My own view is that the terminology should stay consistent with RFC 8259, and that the word "object" should not be used for items that are not JSON objects in the sense of RFC 8259. (Daniel P)

To Carsten's point about what we call things, the number of distinguished terms per RFC8259 is pretty small: JSON text, value, object, array, number, string. Having spent quite a bit of time specifying JSON DSLs, I find that using just those terms doesn't seem to get in the way or cause problems, so I'd argue that we should stick to them (and build up to higher-level constructs as required for JSONPath).

… oh, and I forgot the very useful "member". (Tim Bray)

… and “element” (the things in arrays). (Carsten Bormann)

The problem with JSON value is that it also can be quite confusing due to the usual use of that term. Pointing to a tree and saying “the values inside that tree” is not going to be felt as equivalent to “the set of all subtrees of that tree, including the tree itself”. But if JSON value is the only term we have, it has to be. Hence my preference to talk about data items when I mean the items themselves and not their “value”. (Carsten Bormann)

I think the key difficulty is whether each (key, value) pair in an object is "a thing" that can be identified and manipulated and potentially returned. (If we're talking analogies, then it's analogous to an attribute node in the XDM model). (Michael Kay)

ECMA-404 uses "name/value pair", which is what I understand the term "member" to mean (Douglas Crockford uses "member"). (Daniel P)

I think the term “union” is poor. If we think of it as concatenation of results, then the result is as expected. (Glyn Normington)

I understand, that within RFC8259 we have JSON values of different types. They are structured somehow, which is not so much of interest here.

But while querying that structure with JSONPath it is vitally important to identify that hierarchical structure as a tree. So in fact we build up a higher-level construct here. We also need to call "the things" in the tree somehow. I was able to identify

but could not see an agreement here.

I agree to Glyn calling the term "union" poor (s. below).

Differentiation from JSON Pointer (JSONPath draft charter)

I anticipate being asked "Why is JSON Pointer not sufficient?" Indeed its abstract says:

JSON Pointer defines a string syntax for identifying a specific value within a JavaScript Object Notation (JSON) document.

... which sounds awfully similar. If we could include a sentence about that, or a link to an answer, that might be helpful. (Murray S. Kucherawy)

No - it's not similar in concept, they're separate things. If you really wanted to mention JSON Pointer, you could say something like "Note that while JSON Pointer (RFC xxxx) is already standardised, it is designed to provide a reference to a single, specific part of a JSON document, whereas JSONPath provides the ability to query a document and potentially return multiple values." (Mark Nottingham)

The short answer is that JSON pointer is good if you already know the structure of the JSON data item you want to point into, and you want to point to exactly one position in there. If you need to do something that is closer to a “search” (which might also result in multiple positions), JSONPath gives you more rope. (Carsten Bormann)

+1

References to XPath

I wonder if the analogies between XPath and JSONPath are going to be helpful, or whether they're actually dangerous by implying equivalences between constructs that are in fact somewhat different? (Michael Kay)

I tend to agree. Although JSONPath was inspired by XPath, I wouldn't want to confuse the JSONPath spec by going into detailed comparisons at the risk of contradicting the normative text. (Glyn Normington)

Someone on StackOverflow today asked a question about JSONPath; they called it (and tagged it) XPath, we really don't want that kind of confusion.

In addition, the reference to the XPath specification in 6.2 is out of date, and the comparison with XPath in Table 2 is very approximate and the terminology inaccurate: for example there is a mention of "node sets", which exist in XPath 1.0 but not in XPath 2.0, yet the citation is to XPath 2.0. For someone who knows the semantics of XPath the comparison raises all sorts of questions about sorting of results into document order, elimination of duplicates etc, which are complications this spec can well do without. (Though some answers are needed, for example if ..store..price matches the same price in more than one way, do you get more than one result? And if not, what does "the same price" actually mean?) (Michael Kay)

It seemed to be important in 2007, while argumenting to have something like XPath for JSON. If nowadays the terminology used has changed significantly with XPath 2.0 and 3.0, we better leave that comparison table 2 out. I am quite passionless here.

Array Slice Operator

Thanks! The ABNF for an array slice in that reference

integer = [%x2D] (%x30 / (%x31-39 *%x30-39))

array-slice = [ integer ] ws %x3A ws [ integer ]
                    [ ws %x3A ws [ integer ] ]
                             ; start:end or start:end:step

is consistent with JMESPath, Python, and my understanding of ECMASCRIPT 4. (Daniel P)

Did anyone else have an opinion on the behaviour of slices such as [::0]? The current draft allows this and says it returns an empty array, but there is good reason to say it should error so that the slice operation is then consistent with Python slicing. See below for more context. (Glyn Normington)

It's good having read this thread and thus understand the current draft much better. I like the decision to be consistent with Python and also getting an empty selection set with step=0.

FYI: there is a recent proposal for adding slice notation syntax to JavaScript, currently at stage 1 of the TC39 process.

https://github.com/tc39/proposal-slice-notation

Interestingly it won't have a step argument ...

https://github.com/tc39/proposal-slice-notation#why-doesnt-this-include-a-step-argument-like-python-does

... because of syntax collision with the new this-binding syntax proposal ::

https://github.com/tc39/proposal-bind-operator

However, we should not let us influence by this.

Unions

I don't think any implementation would remove duplicates from a path such as "$.store.book". I believe this is only somewhat controversial in the context of unions [,]. The name "union" suggests that distinct values be returned, compare with SQL unions. But Stefan Goessner's implementation doesn't do that, it concatenates all results that meet each criteria. There are a few JSONPath implementations that produce real unions with no duplicates instead of concatenated results, but I don't think that's the consensus. (Daniel P)

I think the term “union” is poor. If we think of it as concatenation of results, then the result is as expected. (Glyn Normington)

I agree with that comment, but it's partly because I'm used to SQL UNION, which is different. I prefer the JMESPath term for an analogous construct, MultiSelect List, https://jmespath.org/specification.html#multiselect-list. (Daniel P)

Introducing the union operator [,] simply was meant an analogon to XPath's operator '|'. I cannot tell, if it was a simple combination of node sets in Xpath 1.0 or a true union without duplicates. I obviously was not aware of that subtle (essential ?) union characteristic.

So I fully agree to Glyn Normington's '... the term “union” is poor' statement. Are there some better alternative terms, perhaps 'multi-index operator', 'index list', 'subscript list', etc.?

Duplicates and Ordering

It was my impression that we were talking about duplicated nodes not duplicated values:

Given th array [10,20,30]

$..[0,1,0]

Would yield only two results [10, 20]

(Not that I'm advocating for removing duplicates, personally I think we shouldn't) (Marko Mikulicic)

You’re framing this as “removing duplicates”. Another view is that [10, 20, 10] would be “adding duplicates” (copies of the same node). Related are ordering issues:

$..[1,0] ➔ [20, 10] Or [10, 20]

I would expect the spec will leaves implementations some leeway here, but that should be based on an examination of existing implementations. (Carsten Bormann)

The mental model that leads to omitting duplicate nodes in the output is "selection": if you take an input array and select nodes with index 0,1 or 0, you get only 2 results (since selecting an index twice has no effect).

OTOH, if you opt for a "collect" model, whenever you encounter a node that matches that query you add it to the result stream, thus the same nodes can be present multiple times in the result.

I have a slight preference for the "collect" model, because the general case in jsonpath is to collect things that appear at various points in the json tree. For example:

{"a": {"b": 1, "c": 2}, "d": 3}, $.a.b yields [1] and not {"a":{"b":1}}

(i.e. jsonpath is not a filter and view operation but a pick and gather operation) (Marko Mikulicic)

In implementations that support paths (the majority don't), the query function takes a parameter that indicates values or paths. In both cases the query returns a JSON array of JSON values, in the latter case, a JSON array of normalized paths. (Daniel P)

I must confess to never having thought about duplicates, let alone wanting to eliminate them. So I do like Marko's comparison of 'selection-model' vs. 'collection-model' a lot. I would opt for the latter. In this sense the result of a 'JSONPath query expression' should be termed a 'collection'.

Regarding ordering I see something like a 'natural ordering', according to which

$..[0,1] ➔ [10, 20] $..[1,0] ➔ [20, 10]

would result with the example above.

I do understand the use cases for reordering, duplicates removal, filtering, etc.. But this can always be seen as a postprocessing step on the resulting collection by handing it over to accompanying tools (think of pipe operator).

Of course this cannot work on the result collection of values alone (s. duplicate nodes vs. duplicate values above), it rather requires a collection of (normalized) pathes. In this sense, I like this view:

In my opinion the right balance between powerfulness and enabling simple implementations has been so far one of the key factors that made JSONPath popular over other alternatives, even if it lacks support for aggregation functions. (Davide Bettio)

Filter Expressions

Related to that, it would be helpful to determine if JSONPath filters apply to both JSON objects and arrays, or only to JSON arrays. (Daniel P)

I would support restricting filters to arrays, if others agree. (Glyn Normington)

I tend to let implementations and their "normative force of the factual" decide here or in doubt agree to Glyn's restriction to arrays.

I am very unhappy with confusing $..book[(@.length-1)], where '@' addresses the array itself and implies that array has a length property. In filter expression examples '@' more consistently addresses the current array element.

The invocation of 'the underlying scripting engine' wasn't meant a serious normative aspect, but rather a quick and dirty solution for JavaScript and PHP implementations at that time.

Corner Case

Consider this perfectly legal JSON object

{ "ab": 0, "'a.b": 1, "a-b": 2, "a": { "b": 3 } }

So $.ab is 0, $.a.b is 3, $['a.b'] is 1, $['a-b'] is 2. You'd like to say $.a-b but lots of libraries will refuse it because "a-b" is not a legal JavaScript "name" construct, that's why you have to say $['a-b'].

But suppose your library would accept $.a-b. Then $.a-b and $['a-b'] would be synonyms, but $.a.b and $['a.b'] wouldn't. (Tim Bray)

Hmm ... this seems to be a hint to better exclude '-' from dot-child-selector syntax. I think I have read more discussion about that, currently don't know where.

Respect Implementations

As I mentioned in the session, I think there's a non-trivial amount of risk here that some implementations won't be willing or able to move away from their current behaviours, even if interoperability would improve if they did so. However, there are ways to mitigate that (e.g., a separate 'rfcxxxx compliant' mode). Even so, it will be important to get good participation from as many current implementers as possible. (Mark Nottingham)

The WG will develop a standards-track JSONPath specification that is technically sound and complete, based on the common semantics and other aspects of existing implementations. Where there are differences, the working group will analyze those differences and make choices that rough consensus considers technically best, with an aim toward minimizing disruption among the different JSONPath implementations. (Barry Leiba)

I'm OK with this, but for context: I've been a pretty intense JSONPath user in recent years, and AFAIK the spec, and the implementations, are mostly OK, so the choice between "make JSONPath good" and "don't invalidate implementations" is unlikely to come up. If it did, my predisposition would be to err on the side of not breaking implementations, but I don't think that's inconsistent with Barry's text. (Tim Bray)

+1 to all.

Error Handling

My mental model at the moment is that a JSONPath expression can be valid or erroneous; application of a valid expression yields a result (which may be empty), but does not raise errors. That may not be the right model for all applications. (Carsten Bormann)

The general approach that I've seen several times (including my Elixir implementation) is that an error is raised when there is a syntax error, therefore an invalid expression (e.g. $.foo[[5]) raises an error. Conversely a valid expression applied to a bogus input never raises an error (e.g. $.foo.bar on "test" evals as []). (Davide Bettio)

On the whole I think JSONPath is designed to be "forgiving", i.e. such things aren't errors, e.g. I think I read in the spec that filtering a non-array isn't an error, it's some kind of no-op. That approach isn't always best for everyone, but it's important to be consistent. (Michael Kay)

I would expect one component of this policy to be:

Whether a JSONPath query is valid or not does not depend on the arguments it is applied to.

I.e., you can look at the query and find out independently, without knowing any data, whether it is valid or not. (Carsten Bormann)

I like and totally agree with the forgiving mental model, so having only syntax errors, which do not dependent on data.

Thanks

-- sg

fiestajetsam commented 3 years ago

I have created new issues based on the agreed actions during the IETF 110 meeting - just to clarify those which are not already new issues:

This issue can be closed now, however please comment if there is any issue.