ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
59 stars 20 forks source link

Behavior for semantically invalid expressions #212

Closed gregsdennis closed 2 years ago

gregsdennis commented 2 years ago

What is the expected behavior for expressions which are not semantically valid?

For example, < is only defined for numbers. So what if someone does

$[?(@.foo < 'bar')]

Is this a parse error? Does it just evaluate to false for all potential @.foo values?

This kind of thing will be important to define if/when we decide to support more complex expressions (e.g. with mathematic operators) or inline JSON values such as objects or arrays.

cburgmer commented 2 years ago

Here you can see the current behaviour of the various implementations for a similar case: https://cburgmer.github.io/json-path-comparison/#filter_expression_with_greater_than

gregsdennis commented 2 years ago

Thanks @cburgmer. But for this issue, I'd like to explore what we want for the specification.

Also, for some context, I'm updating my library to operate in a "specification compliant" mode by default. Optionally, users will be able to configure for extended support like math operations and inline JSON (as I mentioned above). But until that stuff is actually included in the spec, I'd like it turned off by default.

glyn commented 2 years ago

The spec says:

The well-formedness and the validity of JSONPath queries are independent of the JSON value the query is applied to; no further errors can be raised during application of the query to a value.

and the grammar in the spec includes $[?(@.foo < 'bar')] as syntactically valid. (We could tighten up the grammar to exclude cases where non-numeric literals appear in ordered comparisons, but I don't think there's a great benefit in doing so.)

The spec also says:

Comparisons using one of the operators <, <=, >, and >= are between numeric values only. Using these operators to compare other types of values produces a "false" comparison result.

So, @.foo < 'bar' evaluates to "false".

danielaparker commented 2 years ago

Comparisons using one of the operators <, <=, >, and >= are between numeric values only. Using these operators to compare other types of values produces a "false" comparison result.

Those two sentences contradict each other. You have in fact defined these operators for other values with the second sentence.

So, @.foo < 'bar' evaluates to "false".

glyn commented 2 years ago

@danielaparker You're quite right. I noticed that the other day and fixed it, but I quoted the previous spec before the fix was merged. Apologies. Please see the revised wording in the filter semantics section.

graza commented 2 years ago

I don't understand numeric only semantic constraint. Is there a reason behind it?

JSON has very few value types and lexical ordering of strings is well defined in Unicode. Standards like ISO8601, commonly used for representing date-time values in JSON, give consideration to lexical ordering. So a useful filter for me could be to return JSON objects that have a date greater than (or equal) to a date taken from somewhere else in the same instance.

Having the comparison evaluate to false could also be confusing. For { "a" : 6, "b" : "c" }, what should ?!(@ >= 5) yield for $["b"]? Should it be true? If so such an expression would return [ "c" ] which is a non-numeric result even though a numeric comparison was performed. Or should breaking the semantic rule about numerics cause the entire predicate to yield false, regardless?

gregsdennis commented 2 years ago

Or should breaking the semantic rule about numerics cause the entire predicate to yield false, regardless?

I'd like to be sure that we keep to the topic here and not get distracted by my choice of example. Yes, comparisons with strings is widely well-defined, but this issue is about comparisons that don't align with what's in the spec currently. Currently, as defined by the spec, strings are not comparable.

Maybe my question could be better served with the boolean example mentioned by @graza. What happens in this case?

$[?(@.foo < true)]
danielaparker commented 2 years ago

What happens in this case?

$[?(@.foo < true)]

It's hard to tell from the draft.

But consider something that is stated in the draft,

"Note that comparisons between structured values, even if the values are equal, produce a "false" comparison result."

So, the draft considers comparisons of structured values to be well formed, but always evaluates the comparison to false. Consequently, given element-for-element equal arrays v and w, v == w and v != w both evaluate to false, and so presumably !(v == w) and !(v !=w) both evaluate to true. This suggests to me that the draft's authors intend comparisons that may be regarded as "invalid" to evaluate to false.

Would the draft also have @.foo < true gives false and @.foo > true gives false?

Regardless, the draft's rules for comparing structured values appear to be incompatible with all existing JSONPath implementations, no existing implementation follows these rules, as evidenced in the Comparisons by Filter expression with equals array or equals true and Filter expression with not equals array or equals true. None appear to evaluate both v == w and v != w to false. These include both implementations in which the expression v == w is well defined, and implementations in which the expression is a type error. For example, the Javascript implementations evaluate v == w as false, and v != w as true, because Javascript is comparing references to arrays (rather than values.) Implementations in which the expression is a type error either report an error or return an empty result list.

I don't think you'd find precedence for these rules in any comparable query language, whether JSONPath, JMESPath, JSONAta, or XPATH 3.1. There is priori experience for defining comparisons for all JSON values (see e.g. JAVA Jayway), just as there is priori experience for treating some comparisons as type errors and taking those terms out of the evaluation (see e.g. JMESPath.) But the draft's approach to comparisons is different. It does not seem to follow any prior experience.

glyn commented 2 years ago

What the draft spec says about $[?(@.foo < true)] comes down to the semantics of the comparison @.foo < true. The Filter Selector Semantics section says:

  • comparisons using one of the operators <, <=, >, or >= produce a "true" comparison result if and only if the comparison is between numeric values which satisfy the comparison.

Since true is non-numeric, the comparison @.foo < true produces a "false" result. Therefore, $[?(@.foo < true)] always results in an empty nodelist.

gregsdennis commented 2 years ago

Since true is non-numeric, the comparison @.foo < true produces a "false" result.

But the spec doesn't say that the result is false, and it's incorrect to assume this behavior. It's equally valid to assume that such a comparison is a syntax error.

This is the crux of this issue.

glyn commented 2 years ago

The spec describes comparisons as boolean expressions which therefore have the value true or false. The difficulty of using the terms true and false in that context would be possible confusion with the JSON values true and false.

The spec is currently inconsistent in describing the results of boolean expressions as true/false or "true"/"false", but the fix there is to be consistent.

As for for the syntax side, the spec's grammar implies that @.foo < true is syntactically valid.

gregsdennis commented 2 years ago

the spec's grammar implies

I feel it would be better to be explicit.

danielaparker commented 2 years ago

Since true is non-numeric, the comparison @.foo < true produces a "false" result. Therefore, $[?(@.foo < true)] always results in an empty nodelist.

And $[?(!(@.foo < true))] always returns a non-empty result list. As does $[?(!(@.foo >= true))].

If the idea is that expressions involving unsupported constructions should return an empty result list, than none of the above expressions should return anything. There is prior experience for matching nothing in all cases, e.g. in JMESPath, both [?!(@ > 'true')] and [?!(@ > 'true')], where the greater-than comparison isn't supported, match nothing. There is also prior experience in the JSONPath Comparisons where the presence of unsupported comparisons always results in an empty list. But the approach taken in the draft appears to be a minority of one. There appears to be no prior experience for it.

glyn commented 2 years ago

the spec's grammar implies

I feel it would be better to be explicit.

I don't think we can be any more explicit than the following productions:

comp-expr    = comparable S comp-op S comparable
comparable   = number / string-literal /
               true / false / null /
               singular-path  
comp-op      = "==" / "!=" /
               "<"  / ">"  /
               "<=" / ">="

What did you have in mind?

glyn commented 2 years ago

There is also prior experience in the JSONPath Comparisons where the presence of unsupported comparisons always results in an empty list.

I had a look and couldn't find these tests. Please could you provide a link or links.

gregsdennis commented 2 years ago

What did you have in mind?

It depends on what we want.

If we say @.foo < true is syntactically valid, then we need to explicitly define that it (and @danielaparker's variants, etc) evaluate to false (boolean, not JSON) because they are nonsensical (semantically invalid).

But if we say that these expressions are syntactically invalid, then we need to update the ABNF to reflect that by splitting comparable and comp-op so that we can properly restrict the syntax.

glyn commented 2 years ago

Even if we made @.foo < true syntactically invalid, the same expression could arise from, for example, @.foo < @.bar where @.bar is true.

So, since the spec needs to be clear about the semantics of such expressions, I don't really see the benefit of ruling some of them out syntactically if that would make the grammar messier.

gregsdennis commented 2 years ago

Then we need to be explicitly clear that such cases result in a false evaluation. We don't currently have that.

danielaparker commented 2 years ago

There is also prior experience in the JSONPath Comparisons where the presence of unsupported comparisons always results in an empty list.

I had a look and couldn't find these tests. Please could you provide a link or links.

See, for example, Filter expression with equals array or equals true, Filter expression with not equals array or equals true, and Filter expression with negation and equals array or equals true.

Bash (JSONPath.sh), Elixir (jaxon), and Ruby (jsonpath) all return empty result lists ([]) for

$[?(@.d==["v1","v2"] || (@.d == true))] 

$[?((@.d!=["v1","v2"]) || (@.d == true))]

and

$[?(!(@.d==["v1","v2"]) || (@.d == true))]

On the other hand, I don't think you could find any cases where @.d!=["v1","v2"] evaluated to false while !(@.d==["v1","v2"]) evaluated to true. Not in JSONPath, nor in any of the other good query languages like JMESPath, JSONAta, or XPath 3.1. That is unique to the draft.

glyn commented 2 years ago

One solution would be to introduce "not a boolean" (NaB) analogous to "not a number" (NaN).

Semantically invalid boolean expressions, such as @.foo < true would yield NaB.

NaB would obey the following laws, where b is any boolean or NaB: Law Expression Equivalent Expression
Commutativity of OR NaB \|\| b b \|\| NaB
Commutativity of AND NaB && b b && NaB
Annihilator for OR b \|\| NaB NaB
Annihilator for AND b && NaB NaB
Negation !NaB NaB

(The logical operator laws in the spec continue to hold, but only when none of P, Q, or R are NaB.)

The rule in the spec for filters would still apply:

The current item is selected if and only if the boolean expression yields true.

In other words, a semantically invalid boolean expression anywhere inside a filter's boolean expression causes the filter to not to select the current node. (Edited based on feedback from @cabo.)

What do you make of this solution? /cc @cabo @timbray

cabo commented 2 years ago

On 2022-07-17, at 18:33, Glyn Normington @.***> wrote:

Commutativity of AND NaB && b b && NaB Annihilator for OR b || NaB NaB

That would mean an implementation can no longer short-circuit on b && c where b is false.

I don’t think there is much point in getting an “exception” semantics for unsupported comparisons. If we do want to do them, they should be actual exception semantics. But the loss of short-circuiting is more important to me than the weirdness of returning false (true) in certain cases.

Grüße, Carsten

glyn commented 2 years ago

Yea, good point. I wonder if there is a consistent set of laws which allow both true and NaB to be annihilators for OR and allow both false and NaB to be annihilators for AND? That would at least preserve short-circuiting, but the overall effect would be more difficult to describe as NaB wouldn't then trump everything else.

timbray commented 2 years ago

I'm attracted by the "consistency with JMESPath" argument. I.e. the strings @.foo < true and false can replace each other with no change in effect. I thought that's what the draft said.

I'd be sympathetic with making the literal string @.foo < true non-well-formed but as Glyn points out, that wouldn't make the need for clarity on the issue go away.

cabo commented 2 years ago

On 17. Jul 2022, at 19:53, Tim Bray @.***> wrote:

I'd be sympathetic with making the literal string @.foo < true non-well-formed but as Glyn points out, that wouldn't make the need for clarity on the issue go away.

Right. We might even point out that constructs like this could lead to warnings where the API provides a way to do this.

Grüße, Carsten

glyn commented 2 years ago

I'm attracted by the "consistency with JMESPath" argument. I.e. the strings @.foo < true and false can replace each other with no change in effect. I thought that's what the draft said.

Yes, the behaviour of the current draft for semantically invalid expressions seems to be the same as that of JMESPath after all.

The JMESPath specification says invalid comparisons (such as ordering operators applied to non-numbers) yield a null value which is then treated equivalently to false.

I thought I'd verify this to be on the safe side.

A JMESPath evaluator with the expression [?a<b] evaluated against the JSON:

[{"a": "char", "b": "char"},
 {"a": 2, "b": 1},
 {"a": 1, "b": 2}]

gives the result:

[{"a": 1, "b": 2}]

whereas [?!(a<b)] gives the result:

[{"a": "char", "b": "char"},
 {"a": 2, "b": 1}]

The evaluator with the expression [?a<b || ("x" == "x")] applied to the same JSON gives the result:

[{"a": "char", "b": "char"},
 {"a": 2, "b": 1},
 {"a": 1, "b": 2}]

whereas the expression [?!(a<b || ("x" == "x"))] gives the result [].

danielaparker commented 2 years ago

EDIT: I think you may be right about this after all. Just not with your example, I believe you are testing with an implementation that does string comparisons differently from the spec (if you're using the online evaluator on the JMESPath site.)

the behaviour of the current draft for semantically invalid expressions seems to be the same as that of JMESPath after all.

I guess.

The JMESPath specification says invalid comparisons (such as ordering operators applied to non-numbers) yield a null value

Yes

which is then treated equivalently to false.

See Comparison Operators, which says "Ordering operators >, >=, <, <= are only valid for numbers. Evaluating any other type with a comparison operator will yield a null value, which will result in the element being excluded from the result list." Yes, I believe null is by JMESPath truthiness rules evaluates to false.

I thought I'd verify this to be on the safe side.

Actually, the JMESPath implementation you're using looks like the Python version accessed through the JMESPath site. As the JMESPath author, James Saryerwinnie noted in 0.9.1 date comparisons not working, "What's going on here is that the spec only defines comparisons as valid on numbers (http://jmespath.org/specification.html#ordering-operators). In 0.9.1, validation was added to enforce this requirement, so things that implicitly worked before now no longer work ... Given this is affecting multiple people I'll add back the support for strings now." James Saryerwinnie went on the suggest that he was going to update the spec as well to allow more general comparisons, but that didn't happen. James actually took a break from JMESPath at around that time to quite recently.

A JMESPath evaluator with the expression [?a<b] evaluated against the JSON:

[{"a": "char", "b": "char"},
 {"a": 2, "b": 1},
 {"a": 1, "b": 2}]

gives the result:

[{"a": 1, "b": 2}]

Yes. Here it's actually comparing "char" < "char", and returning false, not null.

whereas [?!(a<b)] gives the result:

[{"a": "char", "b": "char"},
 {"a": 2, "b": 1}]

But you can easily see that it's actually comparing strings here. Change the document to

[{"a": "char", "b": "char1"},
{"a": 2, "b": 1},
{"a": 1, "b": 2}]

and [?!(a<b)] gives

[
{
"a": 2,
"b": 1
}
]

However, I did the following test with JMESPath.Net with

[{"a": null, "b": []},
 {"a": [], "b": null},
 {"a": 2, "b": 1},
  {"a": 1, "b": 2}]

[?a<b] gives

[{"a":1,"b":2}]

[?!(a<b)] gives

[{"a":null,"b":[]},{"a":[],"b":null},{"a":2,"b":1}]

which I think is consistent with the current draft.

gregsdennis commented 2 years ago

Let's bring this back home.

I think we've decided that @ < true is not a syntax error. ✔️

So now we are deciding how it evaluates. The leading proposal is that all of these evaluate to false:

More holistically, any expression which contains an invalid operation evaluates to false.

This would imply that an invalid comparison itself does not result in a boolean directly. Rather (as suggested by @glyn) an implementation would need an internal placeholder (e.g. NaB or undef) that is persistent through all subsequent operations and ultimately is coerced to false so that the node is not selected.

This would mean that the processing of each of the above expressions is as follows:

At best, the use of something like NaB is guidance for an implementation. The specification need only declare that expressions which contain invalid operations (at all) evaluate to false. How a developer accomplishes that is up to them.

danielaparker commented 2 years ago

I think we've decided that @ < true is not a syntax error. ✔️

Well, since @ isn't generally known until evaluation against a document, it can't be detected as a syntax error. Invalid comparisons can be detected as a dynamic error (like in JSONAta or XPATH 3.1). If detected as a dynamic error, it's up to the draft to say what the implementation should do. Sensible options are

So now we are deciding how it evaluates. The leading proposal is ...

All of that sounds complicated and unnecessary. Personally, I don't think any of these constructions are necessary. It is enough to say that in these situations a dynamic error has been detected, and the implementation must react in this way (whatever is decided upon.)

As you say "How a developer accomplishes that is up to them."

timbray commented 2 years ago
  • @ < true
  • !(@ < true)
  • @ < true || true

More holistically, any expression which contains an invalid operation evaluates to false.

Ummmm… I was under the impression that @<true was identical to false and thus your third item would be true. This seems violently counter-intuitive. We're not going to treat @<true as a syntax error, but we are going to cause its presence to have a nonlocal effect on the entire containing expression. OK, it's back to a careful read of the draft for me.

cabo commented 2 years ago

have a nonlocal effect on the entire containing expression.

If you want a nonlocal effect, it is probably best to think in terms of a throw/catch model. (And, yes, I think one needs catch as an explicit mechanism, not just implicitly at the outside of a filter expression.)

gregsdennis commented 2 years ago

I was under the impression that @<true was identical to false

If that were the case, then !(@<true) would be true, which isn't right. The expression is still unevaluateable. Similarly with the || expression.

since @ isn't generally known until evaluation against a document, it can't be detected as a syntax error

True in general, but for this specific case, it's the > combined with true that gives it away, and this can be detected at parse time. Regardless, we need to also consider the case where @ is true.

glyn commented 2 years ago

I think the crucial thing is that these expressions have well defined semantics so that we can test the behaviour and ensure interoperation. I really don't think we should be encouraging the use of these expressions.

My proposal suffers from the lack of short-cutting. If we preserved short-cutting with a similar approach, we'd have to give up commutativity of && and || and probably some of the other boolean algebra laws. Overall, I think such approaches would make the spec larger and consequently increase its cognitive load for users and implementers alike.

Producing a non-local effect is another solution, but I think that too is overly complex for the benefits. If we made the non-local effect that the filter selected no nodes, I think that would be semantically equivalent to my proposal. Also, I don't think we should introduce a try/catch mechanism as the spec complexity would far outweigh the benefits.

The current spec:

So I propose that we stick with the current spec.

gregsdennis commented 2 years ago

But what's written in the spec

Using these operators to compare other types of values produces a "false" comparison result.

doesn't resolve the @ < true vs. !(@ < true) disparity.

Both of these expressions should return an empty node list because both are semantically invalid.

As proof of this, !(@ < true) is logically equivalent to @ >= true, which also returns the empty nodelist.

If @ < true simply and immediately returns false, then this equivalent no longer holds.

cabo commented 2 years ago

I wish we could stop using the term "semantically invalid", because that doesn't say what the construct means.

A lightweight variant of the catch/throw approach I would prefer would be to say that an expression fails. Constructs that have failing subexpressions also fail. A failing filter expression does not select the item for which the filter expression failed.

Disadvantage: Either no short circuiting, or ambiguity with short circuiting. Without short circuiting, one cannot test whether a subsequent subexpression will fail and still get a deterministic result.

(I'm not sure we have all we need to make such tests.)

glyn commented 2 years ago

But what's written in the spec

Using these operators to compare other types of values produces a "false" comparison result.

doesn't resolve the @ < true vs. !(@ < true) disparity.

https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/pull/224 makes the current spec clearer:

Where this is a "disparity" or not depends on what semantics we are aiming at.

Both of these expressions should return an empty node list because both are semantically invalid.

"should" only if we adopt the NaB (or equivalent) or non-local effect semantics.

(I agree with @cabo that the term "semantically invalid expression" isn't very useful when we are trying to define the semantics of such expressions.)

As proof of this, !(@ < true) is logically equivalent to @ >= true, which also returns the empty nodelist.

If @ < true simply and immediately returns false, then this equivalent no longer holds.

https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/pull/224 defines >= in terms of < to preserve this very equivalence.

glyn commented 2 years ago

A lightweight variant of the catch/throw approach I would prefer would be to say that an expression fails. Constructs that have failing subexpressions also fail. A failing filter expression does not select the item for which the filter expression failed.

Disadvantage: Either no short circuiting, or ambiguity with short circuiting. Without short circuiting, one cannot test whether a subsequent subexpression will fail and still get a deterministic result.

(I'm not sure we have all we need to make such tests.)

I think these disadvantages should steer us away from catch/throw, or equivalent, approaches.

gregsdennis commented 2 years ago

224 makes the current spec clearer:

I didn't follow any of this comment. I can't tell what you're saying the spec says vs what you think it should say.

Certainly, you're not saying that @ < true should be evaluated to true, which would return ALL of the nodes, right? Because that's what it sounds like you're arguing for.

Comparisons with booleans don't make sense. Ever. My proof above shows the absurdity of defining one expression to evaluate to false because it, by necessity, means that the converse evaluates to true when neither make sense at all.


Call it what you want. I call it "semantically invalid" because that phrase accurately describes that there are no (common) operative semantics that make this expression evaluatable. Sure, if we define it to mean something, then that technically gives it semantic meaning, but in all ordinary cases of how < works, this operation doesn't make sense and therefore is, by definition, semantically invalid.


There should be one of two outcomes from this issue:

  1. It's a parse error.
  2. It's a runtime evaluation error.

In both cases it's an error; that's not in question here. What kind of error and how it's handled are in question.

We've already seen that it doesn't make sense to be a parse error because we need to account for the boolean value coming from the data (i.e. from @). That leaves us with a runtime evaluation error.

Whenever a runtime evaluation error occurs, because we've decided that a syntactically valid path always returns, it MUST return an empty nodelist.

Therefore, ANY expression that triggers an evaluation error MUST return an empty nodelist. This includes both @ < true and @ >= true.

I don't know how to stress this any more. There are no other logical options to resolve this issue.

cabo commented 2 years ago

This reasoning technique is also called proof by lack of imagination (related to the technique "I don't get out much").

$ erl
Erlang/OTP 24 [erts-12.2] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit] [dtrace]
Eshell V12.2  (abort with ^G)
1> 1 < 2.
true
2> 1 < true.
true
3> 1 == true.
false
4> true < 1.
false
5> true < false.
false
6> false < true.
true
7>

I hear that you are arguing for some exception semantics (similar to the catch/throw I brought up). I can sympathize with that, because I also argued for that. However, we are really free to define the expression language as we want, and what we have now is not that much worse than what I was (incompletely) proposing.

cabo commented 2 years ago

(Found the rationale why Erlang does things this way:

http://erlang.org/pipermail/erlang-questions/2009-May/043851.html

Unfortunately, we can't ask Joe for additional details any more. But apparently these titans in language design for highly reliable systems saw a benefit in a complete total ordering of the type system. JSON does not have a type system, so anything we do will be invention.)

glyn commented 2 years ago

224 makes the current spec clearer:

I didn't follow any of this comment. I can't tell what you're saying the spec says vs what you think it should say.

My comment was about the current spec, especially now that PR #224 had landed.

Certainly, you're not saying that @ < true should be evaluated to true, which would return ALL of the nodes, right? Because that's what it sounds like you're arguing for.

The current spec says that @ < true yields true. Yes (unless that comparison was part of a large boolean expression which sometimes yielded false) that would return all the nodes. That's precisely what I am arguing for as I think it's the best option.

Comparisons with booleans don't make sense. Ever. My proof above shows the absurdity of defining one expression to evaluate to false because it, by necessity, means that the converse evaluates to true when neither make sense at all.

I suspect absurdity, like beauty, is in the eye of the beholder.

Call it what you want. I call it "semantically invalid" because that phrase accurately describes that there are no (common) operative semantics that make this expression evaluatable. Sure, if we define it to mean something, then that technically gives it semantic meaning, but in all ordinary cases of how < works, this operation doesn't make sense and therefore is, by definition, semantically invalid.

I can see where you're coming from.

There should be one of two outcomes from this issue:

1. It's a parse error.

2. It's a runtime evaluation error.

In both cases it's an error; that's not in question here. What kind of error and how it's handled are in question.

We've already seen that it doesn't make sense to be a parse error because we need to account for the boolean value coming from the data (i.e. from @). That leaves us with a runtime evaluation error.

Whenever a runtime evaluation error occurs, because we've decided that a syntactically valid path always returns, it MUST return an empty nodelist.

I don't follow the chain of reasoning there. The current spec returns, but doesn't always return an empty nodelist.

Therefore, ANY expression that triggers an evaluation error MUST return an empty nodelist. This includes both @ < true and @ >= true.

I don't know how to stress this any more. There are no other logical options to resolve this issue.

At the risk of repeating myself, the current spec is a logical option in the sense that it is well defined and consistent.

I think we are all agreed that none of the options is perfect. All the options under consideration are logical, otherwise we'd rule them out. Some would take more words in the spec to explain and that puts a cognitive load on the user for very little benefit.

glyn commented 2 years ago

@greg It seems your preferred approach is:

[...] ANY expression that triggers an evaluation error MUST return an empty nodelist.

Let's try to flesh this out a bit.

What about shortcutting? For example, what should the behaviour of [?1 == 1 || @ < true] be?

There would appear to be three options:

  1. Prescribe an order of evaluation such as "left to right" and allow shortcuts. Result: all nodes are selected.
  2. Do not prescribe an order of evaluation and allow shortcuts. Result: non-determinism - all or no nodes are selected, depending on the implementation. (That's bad for testing and interop.)
  3. Disallow shortcuts (in which case the order of evaluation doesn't matter). Result: no nodes are selected.
graza commented 2 years ago
  1. Prescribe an order of evaluation such as "left to right" and allow shortcuts. Result: all nodes are selected.

This seems useful to me. The spec is free to specify that inequalities (<, <=, >, >=) are only for numbers. Short-circuit evaluation can be used to avoid errors, e.g.:

[ { "type": "number", "value": 10 }, { "type": "string", "value": "ten" } ]

$[? @.type == "number" && @.value < 50] will return the first array element.

gregsdennis commented 2 years ago

what should the behaviour of [?1 == 1 || @ < true] be?

You still have to parse this. And you have to know the value of @ before you evaluate it. That means you're able to know that this expression is nonsensical (to avoid "invalid") before you evaluate it. Thus shortcutting is moot.

cabo commented 2 years ago

Let's use "undesirable" for expressions we don't like.

The example that Glyn provided is just a stand-in for a more complex example that gets true from the JSON value. So the fact that you can parse this and find the undesirable part immediately is not so relevant. With shortcut semantics, the fact that the ignored part of the disjunction is undesirable also is irrelevant.

So I do think that Glyn's question is valid.

danielaparker commented 2 years ago

I was under the impression that @<true was identical to false

If that were the case, then !(@<true) would be true, which isn't right. The expression is still unevaluatable. Similarly with the || expression.

I earlier would have agreed with that. But consider the JMESPath analogy.

In JMESPath, @<true evaluates to, not false, but "absent", represented in JMESPath by null. When "absent" (null) is evaluated as a boolean according to the JMESPath rules of truth, it gives false. It seems consistent that "not absent" is true.

What would you have if you resolved @<true to the draft's notion of "absent", an empty node list?

glyn commented 2 years ago

Ok, let's try a modified example to avoid the syntactic issue. What should the behaviour of [?1 == 1 || @ < $.t] be (where $.t results in a nodelist with a single node with value true)?

With @gregsdennis's preferred approach, there would appear to be three options:

  1. Prescribe an order of evaluation such as "left to right" and allow short-circuits. Result: all nodes are selected.
  2. Do not prescribe an order of evaluation and allow short-circuits. Result: non-determinism - all or no nodes are selected, depending on the implementation. (That's bad for testing and interop.)
  3. Disallow short-circuits (in which case the order of evaluation doesn't matter). Result: no nodes are selected.
timbray commented 2 years ago

I'd like to go with the simplest possible thing unless are strong reasons not to. I think the simplest possible thing is: Type-incompatible comparisons yield false and there are no "exceptions" or nonlocal effects. So in this example you unambiguously get true || false and select all the nodes and if you want to optimize by short-circuiting you can do that without fear of confusion or breakage.

Type-incompatibility yielding false has the advantage that it's easy to explain in English and, I would argue, usually produces the desired effect. If I am filtering something on < $.t and $.t turns out not to be a number - almost certainly an unexpected-data-shape problem - I do not want to match. But if I || that with something else then I'm explicitly OK with not getting a match here.

Yes, there are still corner-case surprises, for example: ! @ < $.t is true if $.t is non-numeric, but @ >= $.t is not true. I claim that this is at least easy to explain and understand in brief clear language.

gregsdennis commented 2 years ago

erlang

A better choice for your argument would have been JS as that was one of the original implementations. Similiar behavior works there (as I discovered testing it in my browser's console).

image

1<true and 1>true result in false 1<=true and 1>=true result in true

Interestingly 0<true and 100>true both result in true for JS. This makes it very apparent that JS is just casting the true to 1 and performing the comparison. This kind of implicit casting is something that we have already explicitly decided against. This means that even the JS evaluation isn't a valid argument.

I could just as easily use a .Net language or any other strongly typed language to show that 1 < true doesn't even compile and is therefore nonsensical.

JMESPath

Why do we continually compare with this? It's a deviated derivative of JSON Path. Are we trying to align JSON Path with it? To what end? I thought we were trying to define JSON Path, not redefine JMESPath to look like JSON Path.

"... develop a standards-track JSONPath specification that is technically sound and complete, based on the common semantics and other aspects of existing implementations. Where there are differences, the working group will analyze those differences and make choices that rough consensus considers technically best, with an aim toward minimizing disruption among the different JSONPath implementations."

Yet no one has thought to see what the implementations do in these cases.

What should the behaviour of [?1 == 1 || @ < $.t] be?

I would say that the short-cutting is likely preferred, but I don't think my implementation would do it that way. My implementation would evaluate all operands then apply the ||. This would result in the strange comparison with true and so it would select no nodes.

C# would shortcut this. @ and $.t would be considered "side-effect" evaluations and would be skipped. That said, strong typing would guarantee that these would evaluate to types that are valid for <, so the resolved expression is guaranteed to make sense in the event shortcutting didn't happen.

(In reality, the C# compiler would optimize 1==1 to a constant true, and ultimately the remaining expression wouldn't even make it into the compiled code, so really there's no short-cutting either.)

cabo commented 2 years ago

On 19. Jul 2022, at 23:48, Greg Dennis @.***> wrote:

A better choice for your argument would have been JS as that was one of the original implementations. Similiar behavior works there (as I discovered testing it in my browser's console)

Well, I chose Erlang because their total ordering is the result of a deliberate design process.

Any ordering that you can derive in JavaScript is the result of a haphazard, now universally despised system of implicit conversions — not quite as good an example for my argument as Erlang’s careful design.

(Some people appear to have a tendency to derive processing rules for JSON from JavaScript. JSON is an interchange format, which derived its syntax from JavaScript. JSON has no processing rules, and so we have to add some. Any inspiration for the ones we decide to use in JSONPath is as good as any other, as long as it meets the needs of its users.)

Grüße, Carsten

glyn commented 2 years ago

For the record, unlike Erland, comparisons in our current draft do not form a total order (a.k.a. a linear order). A total order satisfies this law:

For all a, b:
   b ≤ a or b ≤ a (strongly connected or total)

But, according to our current draft, both true <= false and false <= true are false.

Our current draft does, however, provide a partial order since <= satisfies these laws:

For all a, b, c:
  a ≤ a (reflexivity)
  If a ≤ b and b ≤ a then a = b (antisymmetry)
  If a ≤ b and b ≤ c then a ≤ c (transitivity)

That said, < in our draft is not a strict partial order which would have to satisfy these laws:

For all a, b, c:
  Not a < a (irreflexivity)
  If a < b then not b < a (asymmetry)
  If a < b and b < c then a < c (transitivity)

In our draft:

Our current draft does however preserves the laws of boolean algebra (some of which would be broken by the alternatives being discussed in this issue).