Support for math operators in expressions

gregsdennis commented 1 year ago

I think mathematical operators would be a beneficial addition (😏) to the expression syntax. It would allow things like

$[?@.a+@.b!=@.c]

to check for model consistency. (Arguably, you just wouldn't serialize c as it should be a calculated field, but people do stranger things.)

There are doubtless other use cases.

I have support for this currently in my library. It's really easy to implement, and I don't think it would be too hard to specify.

I think this is within our charter as @goessner's original implementations supported "underlying scripting language" for expressions, which undoubtedly supported these operators.

gregsdennis commented 1 year ago

(I'm happy to defer this until after we've sorted out our function typing issues.)

goessner commented 1 year ago

Well ... this might be useful indeed. But implementing arithmethics and specifying it in a clean way are two very different shoes.

We might deal then with:

0.1+0.2 == 0.3 problem.
Division by Zero.
Define EPSILON
should we allow + operator to also concatenate strings ?
Explicite number type
rounding
sqrt ... where to stop ?

Alternatively, I can imagine, that a function similar to CSS calc would be easy to implement and easier to specify.

glyn commented 1 year ago

Yes, first class support for mathematical operators will entail a lot of spec work. Function extensions could be used instead.

I suggest we defer this issue and tag it "revisit-after-base-done".

gregsdennis commented 1 year ago

The comparison indicates that many implementations support a path like $[?(@.key+50==100)], but it's split about 50/50 between reading that as

a math operation: @.key + 50
a "key+50" key

I wonder how adding in a couple spaces would do: $[?(@.key + 50==100)]. This should differentiate whether math operations are supported.

cabo commented 1 year ago

member-name-shorthand cannot contain a +, so recognizing @.key is not a problem. (The problem is that adding math adds a ton of additional considerations.
E.g., what if @.key is "50" and not 50, etc.)

gregsdennis commented 1 year ago

member-name-shorthand cannot contain a +

Yeah, it's understood that those implementations aren't spec-compliant.

The problem is that adding math adds a ton of additional considerations. E.g., what if @.key is "50" and not 50, etc.

Yeah, it's understood that we'd have to do that stuff. I don't think we should shy away from it, though.

I still think this is within our charter.

ohler55 commented 1 year ago

Personal bias here but I've found simple math operators (-, +, *, /) very useful in practice. With a decision on what to return for a divide by zero I think most end users would like the extra flexibility time math operators provide.

The one limitation I've had users question is why a - character can not be in a token since it can be confused with a minus sign when the token is used in an expression.

gregsdennis commented 1 year ago

We currently forbid - in the shorthand name syntax (requiring the brackets syntax instead), so that's not a problem.

glyn commented 1 year ago

Deferring until after base done.

goessner commented 1 year ago

Follow up of https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/449:

Take the following arithmetic example: (a + b + c)*d/e <= 42, where a,b,c,d,e are members of the current node.

Using a set of small (binary) functions results in the query

$.arr[?div(prod(sum(sum(@.a,@.b),@.c),@.d),@.e) <= 42]

whereas using a calc function looks like

$.arr[?calc('(@.a+@.b+@.c)*@.d/@.e') <= 42]

I predict, most users will prefer the latter syntax.

We need here a function calc

expecting a single argument of type string.
returning the resulting number value or false (or Nothing) in case of an invalid argument.
having access to its environment via closure concept.

The string argument must contain a pure arithmetic expression, that means

only a limited set of (binary?) arithmetic operators is allowed (+,-,*,/,%,**).
operands need to be
- number literals
- singular nodelists containing number values
- functions returning number values or singular nodelists containing number values

When Greg says regarding inline arithmetic:

I have support for this currently in my library. It's really easy to implement, and I don't think it would be too hard to specify.

Then implementing the calc function would even be more easier due to encapsulation. An implementation being able to parse JSONPath queries shouldn't find parsing isolated arithmetic expressions extremely challenging. Specifying that function should be a lot easier than specifying inline arithmetic with all its side effects.

Then there is another charming aspect of this approach.

Imagine the following scenario: A user is supplying a set of parts of simple geometry, holding the part-descriptions in a JSON array.

Each part description is redundancy-free and holds geometric and material properties. The part mass might be a measure of the selling price. So if we want to find all cuboids with a mass less than 20 (kg), we can start the query

$.parts[?@.type=='cuboid' && calc('@.a*@.b*@.c*@.rho') < 20]

where a,b,c in [m] are the cuboid dimensions and rho its density in [kg/m^3].

In case we know - as the JSON author - that the part mass is frequently requested, we can even put into the header section of the JSON data

{  mass: {
     cuboid:"@.a*@.b*@.c*@.rho",
     sphere:"4/3*3.14*@r**3*@.rho",
     cylinder:"3.14*@.r**2*@.h*@rho"
   },
   parts: [...]
}

also the arithmetic expressions for other part masses. This way we can reformulate the query above to

$.parts[?calc($.mass[?index(@)!='']==@.type) < 20]

which of course then requires the useful index function most recently discussed in https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/156.

Apart from that, having simple strings holding arithmetic expressions allows us to store them in JSON for reuse in the same way, as we can do it with JSONPath queries or preferrably with normalized pathes as strings.

That you cannot do conceptually with the barely readable mult/div/sum approach.

@gregdennis:

I fail to see how a calc() function would be any different than just including math operators in expressions. You'd still have to specify what is valid as a parameter to calc() and how that works. It seems easier to just define math operators and be done with it.

... no, due to strong encapsulation and sharp restricted syntax explained above.

@cabo:

Of course, this would break any attempt to have an extensible function interface, ...

I don't see this, please elaborate.

... because calc would need to include half of JSONPath’s syntax and would need access to all the related functionality as well.

... again no, due to strong encapsulation and sharp restricted syntax of pure arithmetic expressions, implementation should be easy, as Greg already mentioned above.

Stefan

ohler55 commented 1 year ago

If we are considering the ease of use for the end user I would think $.parts[?(@.x == @y + 3)] or $.parts[?(@.x == (@y + 3))] would be the most natural.

It shouldn't really matter how hard it is to implement if it is better for the end users. Anyone undertaking the task of implement the spec will have to be competent anyway so a little more work shouldn't be that large a hurdle. (IMHO)

goessner commented 1 year ago

@ohler55 ... I do understand this very well from a user's point of view. But on the way there will be a lot of spec work to be done. So we are discussing here a way, how functions - in which form - can help to add arithmetic expressions to queries, while having sufficient user acceptance.

I would applaud if some implementers gain experience meanwhile by implementing side by side

query inline arithmetic.
encapsulate it in a calc function.

Then they can help to identify edge cases, type collisions and handling of numeric anomalies.

danielaparker commented 1 year ago

Follow up of #449:

Take the following arithmetic example: (a + b + c)*d/e <= 42, where a,b,c,d,e are members of the current node.

Using a set of small (binary) functions results in the query
$.arr[?div(prod(sum(sum(@.a,@.b),@.c),@.d),@.e) <= 42]
whereas using a calc function looks like
$.arr[?calc('(@.a+@.b+@.c)*@.d/@.e') <= 42]
But you don't need a calc function to support that notation, it's very straightforward to incorporate numeric operators into the script expression language, with the usual precedence and associativity. For example, for two C++ and .Net implementations described here, given the following document,

{"arr":[{"a":2,"b":3,"c":5,"d":8,"e":2},{"a":2,"b":3,"c":5,"d":10,"e":2}]}

and query

$.arr[?(@.a+@.b+@.c)*@.d/@.e <= 42]

the result is

[{"a":2,"b":3,"c":5,"d":8,"e":2}]

That is, it's very straight forward if @.a, @.b, etc, evaluate to values, not sure what it would mean if they were to evaluate to nodelists.

Daniel

ohler55 commented 1 year ago

I took the approach described by @danielaparker in OjG but there is no reason all three of the proposed approaches could not be implemented. Having said that, picking one approach as the minimum and offering the others are extensions might be a way to resolve this.

gregsdennis commented 1 year ago

I agree with @danielaparker and @ohler55: these operators need to be supported in general expressions, not merely inside some function.

Then implementing the calc function would even be more easier due to encapsulation... Specifying that function should be a lot easier than specifying inline arithmetic with all its side effects. - @goessner

I don't see how the level of effort for supporting them in a function is any less than to support them in general expressions. If anything I think it's more effort because you have to explain why this syntax is valid only inside of this function.

$.parts[?@.type=='cuboid' && calc('@.a*@.b*@.c*@.rho') < 20]

From a parsing perspective, this is much more complicated than

$.parts[?@.type=='cuboid' && @.a*@.b*@.c*@.rho < 20]

From a user perspective, calc() is unnecessary.

Regarding the "expressions in data" concept, we don't currently support data specifying a path anywhere, and doing so opens a whole new can of worms that we'd need to consider. It's paving the way for an exec() function that executes code.

That you cannot do conceptually with the barely readable mult/div/sum approach.

No one is advocating for this approach. Sure calc() is better than these, but calc() is measurably worse that just supporting math in expressions.

goessner commented 1 year ago

Hmm ... as an outcome of this discussion the realisation matures, that inline arithmetic develops as a de-facto standard in current implementations, which is also the natural thing, users expect.

It seems to be best, to defer activities into that direction until after base done, which in fact was the reason, why Glyn closed this issue.

gregsdennis commented 1 year ago

I came up with this for basic math support:

math-expr = binary-math-expr / unary-math-expr
binary-math-expr = math-operand binary-math-operator math-operand
unary-math-expr = unary-math-operator (number / singular-query / value-function-expr / math-group)
math-operand = number / singular-query / value-function-expr / math-expr / math-group
math-group = "(" math-expr ")"
binary-math-operator = "+" / "-" / "*" / "/"
unary-math-operator = "-"

We'd then add math-expr as an option on comparable

comparable = literal / singular-query / value-function-expr / math-expr

I believe this gives support for addition, subtraction, multiplication, division, and grouping, though it doesn't give operator precedence as yet (I'm working on that).

It does allow multiple negations (e.g. ----4), which is weird. There's also an ambiguity in -4 now between

negative 4 as a number
positive 4 that has been negated

In the end, I'm not sure it makes much of a difference; maybe it saves an operation to have it as "negative 4." Given the outcome is the same, maybe we just let implementations decide how they want to handle it.

It also doesn't prevent division by zero, but we'd have to contend with a path or a function returning zero anyway. I think the math-expr evaluating to Nothing is fine. That would result in a "false" comparison which just wouldn't select the node.

Similarly any path or ValueType function which returns a non-number could result in a Nothing evaluation as well.

This doesn't support string concatenation (yet).

gregsdennis commented 1 year ago

Does the ABNF need to give operator precedence?

4+5*6 is syntactically valid whether or not the syntax understands that * should be performed before +.

cabo commented 6 months ago

Does the ABNF need to give operator precedence?

The principle of least surprise says yes: Implementers will expect the AST they derive from the ABNF to be directly useful for a tree interpreter.

ietf-wg-jsonpath / draft-ietf-jsonpath-base

Support for math operators in expressions #419