Add extension point - Githubissues

cabo commented 2 years ago

At the June interim, we discussed extension points, and decided we should have one. This issue is intended to hold discussion of the extension point until we have a complete proposal.

cabo commented 2 years ago

This proposal will be informed by the discussion in #160.

The general idea is to have a function-calling-style syntax with a comma-separated argument list in parentheses following a name:

function-call = id S "(" S [ arg *( S "," S arg)] ")"

id is the typical [-a-z][-a-z0-9]*, where we can discuss whether this should indeed be pure kebab, and indeed be lower-case only (both of which would be my strong preference).

cabo commented 2 years ago

The names of the id (function-name) go into a registry, which probably should be on the spectrum between specification-required and IETF review, which we'll need to discuss.

The base spec comes with one such preregistered function, length, the exact semantics need to be defined (separate issue, e.g., what is its domain, what does it count, ...). The idea is to make sure people do implement the extension point mechanism and don't just ignore it (See RFC 9170).

cabo commented 2 years ago

Two items are less clear:

What is arg?
Where does the function-call production go into?

We have split the expression language into boolean expressions and comparable; these two don't mix except through a comparison. Comparables have a subset of JSON types available to them, boolean expressions obviously not. What a path (currently always singular-path) is evaluated to depends on the place in that structure it is used in: In a comparable, if extracts the JSON value a path points to (which is why it needs to be singular there), in a boolean expression it checks existence (is there at least one node matching this path ➔ true, false otherwise).

So what does flobble(@.foo) mean? (Of course, we could leave that interpretation to specific functions, essentially giving formal parameters a type of json-value vs. boolean. People could force boolean evaluation using !!@.foo.)

Where does the function go? It could go into the the two places where a singular-path can go right now. We'd have a similar type issue there.

(I'm not sure we need more fine-grained type checking here, but we do need to define what a path means.)

gregsdennis commented 2 years ago

id is the typical [-a-z][-a-z0-9]*

Am I reading this right that it includes hyphen -? Was that meant to be underscore _? We've already decided that hyphen wasn't allowed in property names (in dot-syntax). They should probably be the same.

cabo commented 2 years ago

Am I reading this right that it includes hyphen -?

That would be my standard proposal. But maybe we should keep the path open to eventually embracing arithmetic, which would use U+002D HYPHEN-MINUS for subtraction. So this should be [_a-z][_a-z0-9]*.

gregsdennis commented 2 years ago

Related to #194 (maybe duplicate) Also related to #154

glyn commented 2 years ago

As pointed out in https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/209#issuecomment-1167664526:

[if functions are supported, the spec will need to specify] what it means for an expression that evaluates to "absent" to be passed to a function

danielaparker commented 2 years ago

If "absent" is taken to mean an empty "node set" it may be worth comparing with how XPath and XQuery Functions and Operators 3.1 look at an "empty sequence" as an argument to a function. In the XPath/XQuery model, functions have metadata that describe what parameter types they accept, anything else is considered an error. Translating this to JSON, the parameter types would be the usual null, true, false, number, string, object, array. These types could be annotated to indicate that either a value of that type or the "empty sequence" is allowed. This corresponds roughly to the idea of "nullable types" in C# or std::optional in C++.

gregsdennis commented 2 years ago

Or function parameters could all be node lists, and a JSON value is coerced to a node list containing only that value (like an implicit type cast).

danielaparker commented 2 years ago

@gregsdennis,think about what your function interfaces would look like if you took that approach, it would be awkward compared to the XPath approach. The only information that an empty node list (empty sequence) conveys is non-existence, which in actual implementations can be represented by undefined in Javascript, an undefined JSONValueKind in a C# JsonElement value, an empty Optional instance in Java, and so on.

gregsdennis commented 2 years ago

But doing it by passing a node list means we can use the empty list to represent undefined (or whatever you want to call it) with having to explicitly define undefined, which is something that we're already trying to avoid.

length(@) is literally passing a node list containing the current value because @ represents a path which returns a node list containing just the current node. Similarly for length(@.foo).

It follows then that something like length('string'), would implicitly wrap 'string' in a node list. (Passing explicit values may be useful for other potential functions, e.g. min(@, 10).)

Introducing undefined as a value is a suggestion that has been repeatedly turned down by the WG. Let's please close the lid on that one. This approach shows that we don't need it.

One thing we do need to define, however, is return value. For strings, it makes sense to return the number of characters (or maybe Unicode code points?). For arrays or objects, it could return the count of members.

What would it do for numbers or the JSON constants null, true, and false?

In C#, queries for the index of an item on arrays return -1 to indicate that the item does not exist in the array. This world because the index is supposed to be a non-negative valid, so -1 is an expected error state (an error that doesn't warrant throwing an exception). We could do something similar, but I'm not sure how useful that would be.

danielaparker commented 2 years ago

@gregsdennis Nothing I've written suggested in any way to introduce undefined into the draft :-)

Best regards, Daniel

cabo commented 2 years ago

So at the extension point level, we would probably support both nodelists and JSON values, both as arguments/parameters and as return values. For a length function, the return value if the argument it is called with does not have a length, probably should fit into the general mold that comparisons return false outside the domain of the comparison operator. This seems to argue for returning false.

danielaparker commented 2 years ago

Is the committee thinking about supporting functions as an argument to functions, like jsonata and XPath 3.1? JMESPath doesn't, but does support "expression types", which are similar to lambdas or anonymous functions.

gregsdennis commented 2 years ago

I think that's a bit of an advanced use case and can be added after we have initial function support in.

glyn commented 2 years ago

Is the committee thinking about supporting functions as an argument to functions, like jsonata and XPath 3.1? JMESPath doesn't, but does support "expression types", which are similar to lambdas or anonymous functions.

I was just thinking about this the other day and my position is that the extra cost in the spec (and therefore on implementers and, in terms of cognitive load, on users) is considerably more than the likely benefit. So I think we should at least defer this beyond v1 and possibly indefinitely.

cabo commented 2 years ago

Function parameters will become useful if functions can be defined by the query creator. We are quite a bit away from this level of generality, so I don't think we need to address this.

glyn commented 1 year ago

A function extension point was added in base 09.

ietf-wg-jsonpath / draft-ietf-jsonpath-base

Add extension point #203