ietf-wg-jsonpath / draft-ietf-jsonpath-base

Development of a JSONPath internet draft
https://ietf-wg-jsonpath.github.io/draft-ietf-jsonpath-base/
Other
58 stars 20 forks source link

Types for Implementators vs. Consumers of JSON Path #449

Closed hiltontj closed 1 year ago

hiltontj commented 1 year ago

The latest version of the spec simplifies the type system for functions down to NodesType, ValueType, and LogicalType. I like it in this state, as it is digestible and should provide a good foundation to implement function extensions. I see these three types as natural given that each have analogs in the rest of the spec:

However, while implementing it, I have found this not complete enough. In order to satisfy the implicit conversions defined in the spec, I found that it helped to break the ValueType into three separate types, representing:

  1. A JSON Value (from a node)
  2. A JSON Value (from a literal)
  3. Nothing

It is important in the implementation to distinguish between (1.) and (2.). Although a node, e.g., from a singular query like @.a, results in either a JSON Value, or Nothing, it still requires special treatment when handling implicit conversion. In other words, it makes sense to implicitly convert (1.) to NodesType, but not (2.).

I don't think that this necessitates a SingleNode type at the top level in the type system, because I think that it is more of an implementation detail that would confuse consumers of the spec. However, I think that this deserves mention for implementors of the spec.

cabo commented 1 year ago

I don't think you can convert a ValueType to a NodesType, for exactly the reason you give -- a value need to be anchored in the query argument (beyond literals, e.g., length()).

hiltontj commented 1 year ago

@cabo - I did misinterpret the spec. The spec does not say that ValueType is implicitly converted to NodesType. Clearing that up definitely helps me, so, thank you.

Nonetheless, I am grappling with the Function Extension point in my implementation now, so perhaps discussion here may be helpful.

I think my confusion arises from the fact that, e.g., @.a (a singular query) needs to be interpreted as a ValueType or a NodesType, depending upon the context within which it is used.

So, in my implementation, I divide into two steps: 1) parsing the query, and 2) evaluating the query against some JSON object.

The parsed representation of the query needs to distinguish between the ValueType that is a literal, and the ValueType that originates from a singular query. This is because when it comes time to evaluate the query, the actual value will be derived differently.

I guess the key here is enforcing the given context while parsing, and if that is possible, then the distinction within the ValueType that I am focused on here is much more strictly an implementation detail.

gregsdennis commented 1 year ago

I think my confusion arises from the fact that, e.g., @.a (a singular query) needs to be interpreted as a ValueType or a NodesType, depending upon the context within which it is used.

Singular queries always produce nodelists; specifically, they produce singular nodelists. For comparisons, this case is explicitly called out (current working doc)

When any query or function expression on either side of a comparison results in a nodelist consisting of a single node, that side is replaced by the value of its node and then...

There's not really a "conversion" as such; it just says that you extract the value. But you always get a nodelist from the query.

(Note that we list functions in that text, but NodesType functions are specifically disallowed in comparisons. Probably an oversight that needs to be addressed.)

I think the confusion you've expressed illustrates the need to re-introduce SingleNodeType and extend the type system to the rest of the expression.

cabo commented 1 year ago

(Note that we list functions in that text, but NodesType functions are specifically disallowed in comparisons. Probably an oversight that needs to be addressed.)

Use value() or another function (such as count() or a future function like maybe sum()) that can be used to extract JSON data from nodelists.

hiltontj commented 1 year ago

Singular queries always produce nodelists; specifically, they produce singular nodelists. For comparisons, this case is explicitly called out

The ABNF in that case provides clear rules for parsing a singular-query that can only result in a single node though. And it does state that only functions with ValueType output are accepted in comparisons. So, I didn't run into issues there.

Where it gets tricky is when parsing the function arguments, because the parser needs to parse arguments within the context for each argument, and that context is established by the signature of the function being parsed.

I think that it is becoming more clear to me. I am trying to write the implementation in a way that does not require having to hand-roll the parsing/validation logic for each new function added. It presents a challenge, but I think it is manageable.

cabo commented 1 year ago

On 21. Mar 2023, at 23:54, Trevor Hilton @.***> wrote:

I think that it is becoming more clear to me. I am trying to write the implementation in a way that does not require having to hand-roll the parsing/validation logic for each new function added. It presents a challenge, but I think it is manageable.

Most implementations will probably have a little table with the result and parameter types for each function. Plus probably a way to register additional functions to the implementation.

Grüße, Carsten

hiltontj commented 1 year ago

Plus probably a way to register additional functions to the implementation.

Indeed, that is exactly what I am going for 🙂.

goessner commented 1 year ago

Use value() or another function (such as count() or a future function like maybe sum()) that can be used to extract JSON data from nodelists.

Rather than a set of small arithmetic functions, I can see a general function calc. The question remains whether and how its scope is determined by the working group

gregsdennis commented 1 year ago

Rather than a set of small arithmetic functions, I can see a general function calc. The question remains whether and how its scope is determined by the working group

I fail to see how a calc() function would be any different than just including math operators in expressions. You'd still have to specify what is valid as a parameter to calc() and how that works. It seems easier to just define math operators and be done with it.

Regardless, that seems off-topic for this issue.

the parser needs to parse arguments within the context for each argument, and that context is established by the signature of the function being parsed @hiltontj

Can you clarify this? I haven't had to do this. When parsing the function, my implementation:

  1. parses the function name and a (
  2. parses the comma-delimited arguments, context-free
  3. parses the )
  4. performs a lookup of the function to get its definition (parameter counts and types, etc)
  5. validates that the arguments I received are compatible with the definition

The thing to notice is that step 2 is done context-free. It just parses the arguments, however they appear, and does the function-definition validation later.

hiltontj commented 1 year ago

@gregsdennis - my implementation is performing the same steps. I think my wording there was a bit confusing, but what I was getting at was essentially what you say in (5.)

  1. validates that the arguments I received are compatible with the definition

Once the function expression and its arguments are parsed, context-free, then the arguments are validated using the function definition, i.e., each parsed argument is validated against the type provided by the stored function definition (that is the context I was referring to).

Sorry if the way I worded that implied that I was trying to validate while parsing on the fly.


After spending some time with the latest version of the draft (i.e., on the main branch), I think a lot of the points that caught me up have actually been resolved.

For example, enumerating the acceptable function arguments in the ABNF led to a much cleaner implementation:

function-argument   = literal /
                      singular-query /
                      filter-query / ; (includes singular-query)
                      logical-expr /
                      function-expr

By having the parser distinguish between singular-query and filter-query arguments, I found it easier to validate the parsed arguments using the function definition.

I've been able to get the defined functions (length, count, search, match, value) as well as some custom functions to work just fine; however, I still need to test out what happens when using functions as arguments to other functions.

cabo commented 1 year ago

On 23. Mar 2023, at 05:45, Greg Dennis @.***> wrote:

I fail to see how a calc() function would be any different than just including math operators in expressions. You'd still have to specify what is valid as a parameter to calc() and how that works. It seems easier to just define math operators and be done with it.

A calc function could do something like

calc("(@.a + @.b) * 0.5”)

Of course, this would break any attempt to have an extensible function interface, because calc would need to include half of JSONPath’s syntax and would need access to all the related functionality as well.

So I hope that is not what we will turn up with.

Grüße, Carsten

gregsdennis commented 1 year ago

Of course, this would break any attempt to have an extensible function interface, because calc would need to include half of JSONPath’s syntax and would need access to all the related functionality as well.

This is exactly my point.

cabo commented 1 year ago

That said, a way to "escape" into traditional Math syntax is not a bad idea. We would just need a better way to get queries done inside that syntax.

gregsdennis commented 1 year ago

This conversation needs to be happening in #419, not here.

glyn commented 1 year ago

Closing. @hiltontj if you still have an issue, please re-open.

hiltontj commented 1 year ago

I have spent more time with this. Although I have been able to get a working implementation that is fairly clean, I would like to present further argument for an additional SingleNodeType in the type system.

Consider the following scenario, with a function called first():

fn first(nodes: NodesType) -> ValueType {
    /* returns the first node in the nodelist, or nothing if empty */
}

Although the output of first() represents a node in the nodelist, this has now been obscured by it being a ValueType.

As a comparable, first() works fine, e.g.,

$[? first(@.*) == "foo" ]

But as an argument to a function that expects a NodesType (or LogicalType), or in an existence expression, first() is not valid. Despite the fact that, in my opinion, it makes sense to use it in those places.


Consider adding the SingleNodeType with the following stipulations,

Now, first() can be redefined as:

fn first(nodes: NodesType) -> SingleNodeType {
    /* ... */
}

Making it more versatile when writing JSONPath queries. Many other such functions may exists, e.g., get(index), last(), etc.


As an example, first() may not be overly convincing, because in the scenarios where it would be used as a NodesType argument to another function, or in an existence expression, it is sort of redundant. However, I think that preserving the distinction between JSON literals and nodes is important to:

  1. remove this limitation for users who might run into it with more complex functions, and
  2. be more explicit for implementors of the standard
hiltontj commented 1 year ago

@glyn - you closed it as I was typing 🙂

hiltontj commented 1 year ago

...and I don't seem to have the ability to re-open.

glyn commented 1 year ago

...and I don't seem to have the ability to re-open.

glyn commented 1 year ago

Last time we tried adding a SingleNodeType to the spec, the benefits didn't seem to outweigh the additional cost (see my reasoning in https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/404#issuecomment-1465175176). PR https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/pull/452 repeats the exercise with the current, cleaner spec as a starting point. Again, I am not convinced that the extra complexity in the spec is warranted.

Note that it is possible to add SingleNodeType to the current spec without breaking any valid queries, so if we stay with the current spec and a compelling reason to introduce SingleNodeType is discovered, it could be added at that time as a backward compatible enhancement.

hiltontj commented 1 year ago

@glyn - thank you for drawing that up in #452

[...] Again, I am not convinced that the extra complexity in the spec is warranted.

I have found that the extra complexity is unavoidable in the implementation, and therefore, feel it better for it to be defined in the spec. Singular queries as function arguments already require special handling, because they may be used as either a ValueType or NodesType arguments.

The inclusion of a SingleNodeType makes the path towards implementing that special handling less ambiguous. And, it provides further benefit to the end-user by enabling the definition of more versatile/interoperable functions.


Continuing the first() example from my previous comment. If, say I want to be able to use first() in more than just ValueType-compatible contexts, then I could modify it to

fn first(nodes: NodesType) -> NodesType {
    /* returns a nodelist containing at most the first node of `nodes` */
}

This allows me to use first() directly in existence expressions or NodesType/LogicalType function arguments, but if I want to use it in a comparable, or as a ValueType argument, I need to wrap it in value(). This seems like a higher cognitive burden on the user than just having a first() that works in both contexts without the need for value().

There is no way to go from ValueType to NodesType, despite the fact that in the current spec, ValueType can be used to represent a single node, which should be able to be converted to a NodesType.


Note that it is possible to add SingleNodeType to the current spec without breaking any valid queries, so if we stay with the current spec and a compelling reason to introduce SingleNodeType is discovered, it could be added at that time as a backward compatible enhancement.

That is reassuring, but I think you've shown in #452 that a SingleNodeType fits in nicely in the latest version of the spec; to the point that the benefit does outweigh the cost.

The spec already requires that we understand the distinction between a singular query and a non-singular query; I don't think having an analog within the type system introduces much cognitive overhead.

glyn commented 1 year ago

@glyn - thank you for drawing that up in #452

[...] Again, I am not convinced that the extra complexity in the spec is warranted.

I have found that the extra complexity is unavoidable in the implementation, and therefore, feel it better for it to be defined in the spec. Singular queries as function arguments already require special handling, because they may be used as either a ValueType or NodesType arguments.

I don't find the need for extra complexity in an implementation a compelling argument for making the spec more complex. With my implementer's hat on, I would prefer a simpler spec as it's easier to reason about and to write tests for. JSONPath users (and other readers of the spec apart from implementers, such as compliance test suite developers) benefit from a simpler spec. Also, as a spec writer, I found the addition of SingleNodeType quite fiddly and I'm not sure I've got all the necessary changes in #452.

The inclusion of a SingleNodeType makes the path towards implementing that special handling less ambiguous.

Is there some ambiguous behaviour in the current spec that the addition of SingleNodeType removes? (If not, I'm not sure what additional point this statement is making.)

And, it provides further benefit to the end-user by enabling the definition of more versatile/interoperable functions.

Benefit to the end user would be valuable, so I'd like to explore this further. But why would adding SingleNodeType make any difference to interoperability?

Continuing the first() example from my previous comment. If, say I want to be able to use first() in more than just ValueType-compatible contexts, then I could modify it to

fn first(nodes: NodesType) -> NodesType {
    /* returns a nodelist containing at most the first node of `nodes` */
}

This allows me to use first() directly in existence expressions

If nl is an expression returning a nodelist,first(nl) andnl are equivalent in existence expressions. What would be the benefit of using first(nl) instead of nl in an existence expression?

or NodesType

Is there an example of a scenario where it's necessary to pass first(nl) rather than just nl? If we can't find such a scenario, we can defer the change until such time as a compelling use case may be found.

/LogicalType function arguments,

first(nl) and nl are equivalent in this case, as above.

but if I want to use it in a comparable, or as a ValueType argument, I need to wrap it in value(). This seems like a higher cognitive burden on the user than just having a first() that works in both contexts without the need for value().

I can see some truth in this statement. However, first() has extra cognitive burden compared to value() in that first() is non-deterministic when the input nodelist has a non-deterministic order whereas value() returns Nothing for such nodelists and thereby removes non-determinism.

There is no way to go from ValueType to NodesType, despite the fact that in the current spec, ValueType can be used to represent a single node, which should be able to be converted to a NodesType.

There can be no general conversion from ValueType to NodesType. ValueType cannot represent a single node because it is lacking location information. There could be zero, one, or multiple nodes with a given (non-Nothing) value.

Note that it is possible to add SingleNodeType to the current spec without breaking any valid queries, so if we stay with the current spec and a compelling reason to introduce SingleNodeType is discovered, it could be added at that time as a backward compatible enhancement.

That is reassuring, but I think you've shown in #452 that a SingleNodeType fits in nicely in the latest version of the spec; to the point that the benefit does outweigh the cost.

The spec already requires that we understand the distinction between a singular query and a non-singular query; I don't think having an analog within the type system introduces much cognitive overhead.

452 certainly demonstrates that there is an extra cost and it's not trivial. I haven't personally found the benefit to be compelling yet. Last time around I hoped that introducing SingleNodeType would bring an overall simplification to the spec because of the closer alignment of the type system and the non-function side of the spec, but try as we might, we couldn't get the hoped-for simplification to materialise.

gregsdennis commented 1 year ago

JSONPath users (and other readers of the spec apart from implementers, such as compliance test suite developers) benefit from a simpler spec.

I agree with this. We as implementors necessarily have a harder job of making the specification accessible via a usable API.

hiltontj commented 1 year ago

I don't find the need for extra complexity in an implementation a compelling argument for making the spec more complex. With my implementer's hat on, I would prefer a simpler spec as it's easier to reason about and to write tests for. JSONPath users (and other readers of the spec apart from implementers, such as compliance test suite developers) benefit from a simpler spec. Also, as a spec writer, I found the addition of SingleNodeType quite fiddly and I'm not sure I've got all the necessary changes in https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/pull/452.

That's fair, I respect that.

Is there some ambiguous behaviour in the current spec that the addition of SingleNodeType removes?

No, I am not aware of any.

(If not, I'm not sure what additional point this statement is making.)

The inclusion of SingleNodeType, for me, made the function extension easier to implement. That may not be the case for others, so perhaps is a moot point.

But why would adding SingleNodeType make any difference to interoperability?

"Interoperability" may have been a poor choice of word? What I mean to say is that allowing functions that can return a SingleNodeType allows for the definition of functions that can be used in more places. With #452, first() -> SingleNodeType can be used as a comparable, as an existence expression, and in the place of any function argument. In the absence of SingleNodeType, such a function is limited to a subset of those, depending on how it is defined, and in the best case, needs to be wrapped in value() to work in all scenarios.

If nl is an expression returning a nodelist, first(nl) and nl are equivalent in existence expressions. What would be the benefit of using first(nl) instead of nl in an existence expression? [...]

I don't want to get too hung up on what first() does, and to be clear, I'm not advocating it as an addition to the spec. It is just an example of a function that returns the location of a single node in the JSON object being queried. There could be many such functions.

I'll admit, first() does not make a very compelling argument. And I don't know of another example right now that would be more compelling. So I suppose that this can be closed until such an example is encountered.


Perhaps I am missing the point of Function Extensions in the spec. My interpretation is that you are giving implementors the ability to define their own functions. There is a register of functions required to make the implementation compliant, but the type system provides a framework to construct new functions. The addition of SingleNodeType allows me to define functions with a more explicit and self-describing API.

glyn commented 1 year ago

Thanks @hiltontj. It will be interesting to see if/when a compelling use case for SingleNodeType is found.

cabo commented 1 year ago

On 2023-03-27, at 22:14, Trevor Hilton @.***> wrote:

Perhaps I am missing the point of Function Extensions in the spec. My interpretation is that you are giving implementors the ability to define their own functions. There is a register of functions required to make the implementation compliant, but the type system provides a framework to construct new functions.

The main objective was to have an extension point where future evolution can happen without completely losing forward compatibility. Once we had this, it also provided a convenient way to introduce features such as regexps without a need for special syntax for them.

BTW, thank you for testing the hypothesis again that we have reached a local optimum now; without pushback like yours we would get lazy.

Grüße, Carsten