ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
60 stars 20 forks source link

key missing vs key present with null #118

Closed gregsdennis closed 2 years ago

gregsdennis commented 3 years ago

A new StackOverflow question highlights the need to define whether these two cases are the same or different:

I would expect that Javascript would handle these cases differently, but .Net generally wouldn't (it seems Newtonsoft has some special handling for this).

Posed in a more direct, testable way, given the data

[
  {
      "id": 2
  }
]

would

$[?(@.foo==null)]

return any results?

(related to #64 and #47)

cburgmer commented 3 years ago

I think the corresponding test in the comparison project is https://cburgmer.github.io/json-path-comparison/results/filter_expression_with_equals_null.html. Currently no consensus exists.

danielaparker commented 3 years ago

I think this is related to #74. Or, what does the left hand side of @.foo==null mean? It means different things in different implementations.

For JSONPath implementations in interpreted languages such as JavaScript, Python or PHP, and whose filter expressions are these languages, @.foo in the case of no match may mean undefined, null, false, or a type error.

In Javascript, @.foo in the case of no match means undefined. So in Goessner Javascript, we have:

$[?(@.foo==null)] returns [ { "id": 2 }] $[?(@.foo==false)] returns [ { "id": 2 }] $[?(@.foo===null)] returns false $[?(@.foo===false)] returns false $[?(@.foo===undefined)] returns [ { "id": 2 }]

For implementations such as Jayway, jsoncons, and JsonCons.JsonPath, where @.foo means a JSONPath expression that returns a single JSON value (rather than an array of values), it depends on what the implementation specifies @.foo to evaluate to when there is no match. For some, it's null, for others false, and for others it's a type error. In Jayway, by default it's a type error, and the item being evaluated is omitted from the results (Jayway has an option to evaluate to a null value instead.)

In JMESPath, for comparison, evaluating an object with a missing key is specified to return a null value, so

[?foo==null]

returns

[  { "id": 2 }]

In JSONiq, for another comparison, evaluating an object with a missing key results in an empty sequence, viz.

let $doc := [
  {
      "id": 2
  }
]

for $x in $doc[]
where $x."foo" eq null

return $x
gregsdennis commented 3 years ago

Yes, it does relate to that as well! I interpret literals as JSON values.

danielaparker commented 3 years ago

Yes, it does relate to that as well! I interpret literals as JSON values.

That and https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/64#issuecomment-831755108

danielaparker commented 3 years ago

Yes, it does relate to that as well! I interpret literals as JSON values.

That and #64 (comment)

In any case, it's impossible to talk about what

?(@.foo==null)

means without first defining what @.foo means (including when foo is missing), and what == means. Different implementations define them differently, the draft doesn't currently define them at all.

Daniel

gregsdennis commented 3 years ago

@danielaparker you're getting a little ahead of yourself. Operators and relative paths in expressions are discussed in other issues that deal with expressions themselves. These are quite well-defined in those issues.

danielaparker commented 3 years ago

@gregsdennis, just noting that the answer to this issue you raised follows immediately once the meaning of @.foo and == have been defined. Nothing more.

cabo commented 3 years ago

@gregsdennis, just noting that the answer to this issue you raised follows immediately once the meaning of @.foo and == have been defined. Nothing more.

Indeed, so the discussion here should inform the decisions we make for empty @.foo and for ==.

goessner commented 3 years ago

The latest draft seems to be pretty clear about that.

https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/draft-ietf-jsonpath-base.html#name-filter-selector

A member or element value by itself is falsy only, if it does not exist. Otherwise it is truthy, resulting in its value. To be more specific explicit comparisons are necessary. This existence test -- as an exception of the general rule -- also works with complex values.

Problem here is, that JSON lacks undefined value (from JavaScript). So we decided to use standalone path expressions in filters as existence tests

Using new in operator also allows test for existence now as with [?('foo' in @)].

Most of discussions were done in this pull request

https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/pull/111

Am I missing something here?

-- sg

danielaparker commented 3 years ago

The latest draft seems to be pretty clear about that.

https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/draft-ietf-jsonpath-base.html#name-filter-selector

Most of discussions were done in this pull request

111

Am I missing something here?

What's missing, and what's missing generally in the draft, is any survey and analysis of prior experience in the great physical tarpit that is JSONPath. And if the purpose of this IETF effort really is to standardize existing practice, I think that's important. Of course, if that's not the purpose, than it doesn't matter, but everybody says that's the purpose.

It's not that the rules you've sketched here cannot be motivated by some prior experience, see e.g my analysis here. 11 of 42 implementations, including two that may be considered important, do follow this rule:

[?(@.foo)] ... selected, if member foo exists.

But Goessner Javascript doesn't follow that rule. Nor do a significant number of others.

If the draft really is about standardizing existing behaviour, it seems to me that it needs to justify its choices with reference to prior experience. Not just with what "we" (a handful of people) "agreed".

goessner commented 3 years ago

hmm ... in my original proposal I included (as already someone mentioned):

//book[isbn]   | $..book[?(@.isbn)]   |  filter all books with isbn number |

It seems to be, this was of no significant relevance for existing implementations.

gregsdennis commented 3 years ago

@goessner would your implementation match on isbn: null?

goessner commented 3 years ago

of course it would ... as I reused the Javascript engine for simplicity then.

danielaparker commented 3 years ago

@goessner, maybe I've misunderstood Greg's question, but I think you meant to reply "of course it wouldn't". Using Goessner Javascript, and given the JSON document

{
    "store": {
        "book": [
            {
                "isbn" : null
            }
    ]
  }
}

the query

$..[?(@.isbn)]

returns nothing. This is what we expect from JavaScript, and is consistent with the results reported here.

Jayway (Java), on other other hand, returns

[
   {
      "isbn" : null
   }
]
danielaparker commented 3 years ago

hmm ... in my original proposal I included (as already someone mentioned):

//book[isbn]   | $..book[?(@.isbn)]   |  filter all books with isbn number |

It seems to be, this was of no significant relevance for existing implementations.

I don't think any of the JSONPath implementations that use a dynamic language for evaluation, whether JavaScript, PHP, Python, Ruby or Lua, evaluate ?(@.isbn) that way. They all evaluate @.isbn first and then apply "not falsey" rules of one sort or another to the result. It's the textbook way for interpreting an expression. That's a lot of implementations, including the original.

On the other hand a significant number of (not all) implementations that implemented their own expression evaluators do evaluate ?(@.isbn) that way, including Jayway JsonPath. It's messier to implement it this way because it's not following the textbook evaluation patterns. Jayway does it by treating @.isbn as a recoverable error if isbn is missing, and handling that error to allow other terms being evaluated to continue.

cabo commented 3 years ago

If the draft really is about standardizing existing behaviour, it seems to me that it needs to justify its choices with reference to prior experience. Not just with what "we" (a handful of people) "agreed".

The charter is quite specific:

The WG will develop a standards-track JSONPath specification that is technically sound and complete, based on the common semantics and other aspects of existing implementations. Where there are differences, the working group will analyze those differences and make choices that rough consensus considers technically best, with an aim toward minimizing disruption among the different JSONPath implementations.

Since we do have those differences here, we are asked to consider what is "technically best", but "with an aim toward minimizing disruption". This seems clear to me, and I think our charter writers did a very good job here.

danielaparker commented 3 years ago

Since we do have those differences here, we are asked to consider what is "technically best", but "with an aim toward minimizing disruption". This seems clear to me, and I think our charter writers did a very good job here.

I think it's hard to make a case that the draft is doing much to minimize disruption.

On the one hand, it's becoming incompatible with all of the implementations that use a dynamic language such as Javascript, PHP, Python, Ruby or Lua for evaluating expressions. For users of such implementations, having an expression language in their familiar language, with all of the functions that that language provides, is part of the appeal of these implementations. The users of these implementations all have a normalize-unicode function equivalent for comparing strings, and much else. These users don't care about interoperability, they care about familiarity and expressiveness.

On the other hand, the WG is sketching out an expression language that is different from the expression languages that were developed by other implementations mostly following Jayway Java. These expression languages specified @.relative-path as a JSONPath expression, with rules for "ExecuteSingle" to produce a single value for use in comparisons. @. means something different in the draft, where it's not a JSONPath expression. But Jayway Java is integrated into legacy Java enterprise software, and cannot change.

It seems to me that the draft is departing from both streams of JSONPath that evolved from the original article. I don't see anything in the draft that suggests that the editors consider prior experience important. There is no text in the draft that discusses the important implementations, and which features were motivated by one, which from another, and which are new or refinements.

Anyway, I realize that nobody here is interested in this topic, no doubt for the obvious reason :-)

glyn commented 3 years ago

Daniel wrote:

I don't see anything in the draft that suggests that the editors consider prior experience important. There is no text in the draft that discusses the important implementations, and which features were motivated by one, which from another, and which are new or refinements.

I consider prior experience to be very important and I'm sure Stefan does too. However, I am not sure to what extent the spec should reference particular implementations since such references will not necessarily stand the test of time. Perhaps an appendix is the way to do it, in which case, please feel free to submit a PR to help lead the way forward. Editors need not be the only authors.

Anyway, I realize that nobody here is interested in this topic, no doubt for the obvious reason :-)

Not so.

On Thu, 16 Sept 2021 at 20:48, Daniel Parker @.***> wrote:

Since we do have those differences here, we are asked to consider what is "technically best", but "with an aim toward minimizing disruption". This seems clear to me, and I think our charter writers did a very good job here.

I think it's hard to make a case that the draft is doing much to minimize disruption.

On the one hand, it's becoming incompatible with all of the implementations that use a dynamic language such as Javascript, PHP, Python, Ruby or Lua for evaluating expressions. For users of such implementations, having an expression language in their familiar language, with all of the functions that that language provides, is part of the appeal of these implementations.

On the other hand, the WG is sketching out an expression language that is different from the expression languages that were developed by other implementations mostly following Jayway. These expression languages specified @.relative-path as a JSONPath expression, with rules for "ExecuteSingle" to produce a single value for use in comparisons. @. means something different in the draft, where it's not a JSONPath expression.

It seems to me that the draft is departing from both streams of JSONPath that evolved from the original article. I don't see anything in the draft that suggests that the editors consider prior experience important. There is no text in the draft that discusses the important implementations, and which features were motivated by one, which from another, and which are new or refinements.

Anyway, I realize that nobody here is interested in this topic, no doubt for the obvious reason :-)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/118#issuecomment-921196950, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXF2OMWFKBT5DNTHL3FULUCJCXHANCNFSM5DPUZ5FA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gregsdennis commented 3 years ago

I agree that such references do not belong in the main text of the spec. An appendix is okay. I'd I had my way, if have a separate companion doc that summarized individual decisions and the reasoning behind them. Such a document would be a natural place to include references to behaviors in existing implementations.

danielaparker commented 3 years ago

@glyn wrote

Perhaps an appendix is the way to do it, in which case, please feel free to submit a PR to help lead the way forward.

If the purpose of the WG is to standardize existing behavior, as they say it is, I think it's up to the authors to justify how their choices relate to prior experience.

For example, there is plenty of prior experience with specifying filter expressions, starting with Jayway, the second most important and influential implementation after Goessner. Jayway, for example, has answers to all the questions like https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/118, https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/119 and https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/123. But the WG take on expressions appears to be inconsistent with that prior experience in significant respects. I think the authors should explain why. That is, if the authors really mean it when they say that their intent is to standardize existing behavior.

cabo commented 3 years ago

Sent from mobile, sorry for terse

On 24. Sep 2021, at 14:53, Daniel Parker @.***> wrote:

If the purpose of the WG is to standardize existing behavior

It’s not. Please reread the charter.

RFC 2418 talks more about the role of authors (ie to implement the directions of the WG), and I’m certain the authors will do that. If the WG wants to have a big rationale section to document their decisions beyond what feels natural for the authors, all I can say is “send text”…

danielaparker commented 3 years ago

Sent from mobile, sorry for terse On 24. Sep 2021, at 14:53, Daniel Parker @.***> wrote: If the purpose of the WG is to standardize existing behavior It’s not.

Thanks for the clarification.

cabo commented 2 years ago

Fixed in 76789d8

gregsdennis commented 2 years ago

Can you summarize the resolution, please?

cabo commented 2 years ago

Since 76789d8, Section 3.5.9 now says:

   *  A member or element value by itself in a Boolean context is
      interpreted as false only if it does not exist.  Otherwise it is
      interpreted as true.  To be more specific about the actual value,
      explicit comparisons are necessary.  This existence test -- as an
      exception to the general rule -- also works with structured
      values.