ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
58 stars 20 forks source link

Function extensions need complete definitions for their arguments #359

Closed gregsdennis closed 1 year ago

gregsdennis commented 1 year ago

A lot of work went into building the type system for functions. Additionally, the spec says that if, during parsing, a provided argument is determined to not match the type specified by the function, the implementation is to error.

I have a couple issues here.

Argument specifications don't use the type system

The arguments for the functions defined in the document don't use the type system laid out in the previous section. There are "informal" references, like this for length():

Its only argument is a value (possibly taken from a singular path as in the example above). The result also is a value, an unsigned integer.

  • If the argument value is a string, the result is the number of Unicode scalar values in the string.
  • If the argument value is an array, the result is the number of elements in the array.
  • If the argument value is an object, the result is the number of members in the object.
  • For any other argument value, the result is one.

This initial sentence should reference one or more of the types defined in table 13. In this case, I assume the argument should be OptionalNodeOrValue.

(Edit: It does appear that match() and search() use the values from this table, but length() and count() do not.)

Values determined at runtime

If a path (whether singular or otherwise) is provided, then it's the value produced by that path that is evaluated against the argument type. This can't be determined until runtime, so no parsing error can be produced, and errors shouldn't occur afterward.

Happily, it appears that the functions defined in the document do cover all of the input cases, but I feel this puts a large burden on others who define new functions in that they have to ensure this coverage.

Would it be easier/better to simply state globally that functions which receive arguments of types other than they expect should simply return Nothing? By doing this, we could also remove the parsing error requirement (or reduce it to SHOULD), which would relieve some burden from implementors as well.

glyn commented 1 year ago

(Edit: It does appear that match() and search() use the values from this table, but length() and count() do not.)

Actually, length does have typed arguments:

Screenshot 2023-01-18 at 21 03 00

as does count:

Screenshot 2023-01-18 at 21 04 44

Which version of the spec are you reading? I wonder if something in the formatting is misleading?

glyn commented 1 year ago

Would it be easier/better to simply state globally that functions which receive arguments of types other than they expect should simply return Nothing? By doing this, we could also remove the parsing error requirement (or reduce it to SHOULD), which would relieve some burden from implementors as well.

That wouldn't work for results of type Value or Boolean, neither of which have Nothing as an instance.

gregsdennis commented 1 year ago

Actually, length does have typed arguments

Ah, yes. I didn't see that bit. That seems resolved.

gregsdennis commented 1 year ago

That wouldn't work for results of type Value or Boolean, neither of which have Nothing as an instance.

Following my suggestion would imply that all functions could return Optional* variants of their values.

glyn commented 1 year ago

@gregsdennis In order for your suggestion to be useful, I think it would be necessary to weaken the following statement in the spec:

Any function expressions in a query must be well-formed (by conforming to the above ABNF) and correctly typed, otherwise the JSONPath implementation MUST raise an error (see Section 2.1).

If so, how would you propose to weaken that statement?

gregsdennis commented 1 year ago

I don't think that statement needs to be weakened. I think we just need to add a requirement that functions MUST be able to return Nothing if their inputs aren't what they expect. This would mean that Value and Boolean (or any other non-Nothing types) would not be valid return types.

Edit

I do think this need only apply to cases where the input value type cannot be determined at parse time, e.g. the input is a path or another function. If the value is a literal, the input value type can be determined and so an error should be raised.

gregsdennis commented 1 year ago

☝️ that mention was incorrect. Please ignore.

cabo commented 1 year ago

I think we just need to add a requirement that functions MUST be able to return Nothing if their inputs aren't what they expect.

I'm not sure that is true for all function extensions. The ones we have looked at, yes. But we should not limit our imagination to those.

gregsdennis commented 1 year ago

Not adding this requirement implicitly adds a requirement on proposed functions to specify return values for every interpolation of input types for every argument.

Adding this requirement allows proposed functions to only specify what they expect; anything outside of that returns Nothing.

glyn commented 1 year ago

I don't think we should weaken the current type correctness rules for function expressions (section 2.6.2) as this provides early diagnosis of invalid queries. I would like all implementations to behave consistently in this respect.

If we keep those rules, then we need to decide whether a function spec needs to explicitly define the return value for all possible argument values (this is effectively implied by the current spec) or whether some values can be specified and the rest left to default (to Nothing). I think it's better for function specs to be complete and explicit because forcing authors of functions to go into that level of detail will make sure they think through all the cases.

gregsdennis commented 1 year ago

I think it's undue burden on function creators and those accepting submissions into the function registry, but if that's the consensus here, okay.