Allow for more precisely defined argument types for functions

gregsdennis commented 1 year ago

Related to #359 but slightly different, so I figured a new issue would be better. Also tangentially related to #361.

Functions typically expect a JSON data type to operate properly. For example, length() only works with objects, arrays, and strings. However, length() currently only specifies that it requires a Value.

If we allow functions to specify more precisely what kinds of JSON values they expect, parsers would be able to raise errors at parse time when literal arguments don't match expectations.

For example, length(1) is currently valid because the input type for length() is Value. However, if we let length() specify that it expects objects, arrays, and strings, then things like this can be caught at parse-time rather than just returning Nothing.

I should note that this only applies to literals as arguments. If the argument value is passed in as the result of a subquery, like length(@.a) and @.a contains a 1, then this doesn't apply as nothing can be determined at parse time.

glyn commented 1 year ago

I quite like the extra parse-time checks this would enable.

I'm less keen on the asymmetry this introduces between checks on literals vs non-literals.

I don't think it would add much value for length(), since I doubt that length() will be applied to a literal very much in the wild (the length of a literal could be replaced by an integer). It would add some value for match() and search() (and similar future function extensions) which are likely to be used with literal values.

glyn commented 1 year ago

I wonder if tightening up the parse-time checks in this way would mean we should tighten up the parsing of other syntax? For example, in a filter expression the comparison 13 == '13' is currently specified to yield false, but the current issue may make us want it to fail to parse. That puts me off introducing extra parse-time checks for function arguments.

gregsdennis commented 1 year ago

Maybe it could be sufficient to allow but not require implementations to perform additional type checking during parsing and raise errors if issues are found.

I'm not sure how that could be tested. Maybe an optionalFailure property in the CTS?

gregsdennis commented 1 year ago

On the other hand, I think this could be something that I have a configuration option to enable. That way, people that want it can enable it and thus they expect the (slightly) different behavior.

JSON Schema actually includes requirements for such options in its spec, saying, "an implementation MAY provide feature X, and if it does it needs to behave this way."

cabo commented 1 year ago

I think we are in linter territory here. Clearly implementations can issue warnings if there is a way to funnel them back to the author. But I don't think we want to add normative semantics for this.

gregsdennis commented 1 year ago

It sounds like we are happy to make this an implementation behavior, but I'd want to require/encourage implementations to make tolerance of explicit type mismatches like length(1) the default behavior.

gregsdennis commented 1 year ago

I've updated my implementation to include this as a configuration options. It seems to work well.

Option off (default) - parsing is tolerant and evaluation returns an empty list.
Option on - parsing raises an error for explicit type checks.

I've updated the test cases to reflect this stance as well.

cabo commented 1 year ago

length(1) is a valid way to say Nothing (we don't seem to have a way to say that directly), so treating that as an error is a deviation. (It is still a potentially useful warning, and "treat warnings as an error" options abound, so I don't think your implementation option is wrong -- it just takes the implementation out of compliance when set.)

gregsdennis commented 1 year ago

it just takes the implementation out of compliance when set

Yeah, I think that's it generally understood that compliance deviation happens when an option is set.

I still think that identifying and erroring on an explicitly invalid expression (based on the function definition) should be the default behavior, but I'm fine with it being tolerated by default and providing an option.

fiestajetsam commented 1 year ago

Agreed on February interim to close this issue.

ietf-wg-jsonpath / draft-ietf-jsonpath-base

Allow for more precisely defined argument types for functions #368