Filter selector cleanup

cabo commented 1 year ago

General simplification and cleanup of the type system used in filter selectors.

Intended to supersede #402, which this is based on. Intended to supersede #395. Closes #387. Closes #389. Closes #401.

cabo commented 1 year ago

I like the general approach, although, after further consideration, I think we need to have more than three types. I've made various comments and suggestions, at various levels. Some of the editorial changes can easily be adopted in this PR. The removal of type conversions is a larger change and could be deferred to a follow-on PR.

So I think what we want to do is get this PR consistent, and then open the floodgates on 3 vs. 4 types. At least I now know that you can implement this spec in a day :-)

gregsdennis commented 1 year ago

At least I now know that you can implement this spec in a day

I think that likely depends on the language.

cabo commented 1 year ago

On 2023-02-21, at 23:13, Greg Dennis @.***> wrote:

I'd also prefer to leave "value" out of the logical context, to be honest.

Indeed. 708def9

gregsdennis commented 1 year ago

Whether or not functions (and/or paths) returning non-singular nodelists should be allowed in comparisons.

From the ABNF, paths that return multiple nodes are not allowed in comparisons. I agree with this, and this PR is not changing it.

From the type system, as described here, functions that return multiple nodelists are allowed in comparisons. I think this is fine with the defined conversions (also see below). To change this, we would need to (one of):

disallow nodelist functions in comparisons at all
define a new type that represents a single-node nodelist that is suitable for comparisons
(there's also a third option)

Disallow nodelist function in comparisons

I'm okay with this. I'm not sure if this would limit any functionality, though. It would limit functions that return NodesType to logicals because the nodelist is still convertible to (or "has meaning in," or however you want to say it) a logical.

The examples I put up for the hypothetical distinct() function were contrived to endure @cabo's tests. I certainly can't think of a practical use case to put a nodelist in a comparison.

This does raise the question of whether to even have NodesType, though. If a function that returns NodesType is only valid as a logical, then it might as well return BooleanType.

Define a new type for single-node nodelists

A function that returns a single-node nodelist would be valid in a comparison, but a function that returns NodesType would not.

This is really an extension of disallowing nodelist functions in comparisons, so all of that logic still applies (i.e. is NodesType still useful?).

Conversely, a function that returns the new single-node nodelist type would only be useful in a comparison (otherwise, we run into the same inconsistencies that I've been highlighting this whole time). Because it's only useful in a comparison, it might as well just return the value of the single node, which is ValueType. So why have it?

The third option

Only allow nodelist functions to be parameters for other functions, i.e. disallow them from both logicals and comparisons.

This would only allow something like $[?count(distinct(@.*))==42], which is arguably useful, but both $[?distinct(@.*)] and $[?distinct(@.*)==42] would be invalid.

Whether the spec would be clearer if it avoided type conversions.

We cannot avoid type conversions. They are implicit in the grammar.

To prove this, I'll exclude functions.

All paths return nodelists. They don't return logicals, and they don't return values. Singular paths, specifically, return a nodelist with a single value.

Paths are always allowed to act as logicals. That is, there is a conversion from a nodelist to a logical. Namely, if a nodelist has one or more nodes, it is treated as (converted to) LogicalTrue; if it has no nodes, it is treated as (converted to) LogicalFalse. If I were to write $[?@.a] as a test expression explicitly (in a C-ish syntax), it would be:

$[?(LogicalType)@.a]

where I'm "casting" the nodelist returned by @.a to LogicalType.

Paths are allowed to act as values if they're singular. That is, there is a conversion from a nodelist to a value if the nodelist contains at most a single node. If the nodelist were to contain more than one node, the conversion would be undefined, and a runtime error would occur. To avoid the potential error, the ABNF applies an additional restriction to identify paths which are guaranteed to return at most a single node, and we can catch it at parse time. Writing $[?@.a==42] explicitly would be:

$[?((ValueType)@.a)==42]  // extra parentheses not necessary, but can help readability

where I'm "casting" the nodelist returned by @.a to a ValueType.

We have specified these conversions by instead describing their behavior, but that doesn't detract from the fact that they're there.

Type conversions are unavoidable. We might as well state them. I think it helps us in the end.

glyn commented 1 year ago

Whether or not functions (and/or paths) returning non-singular nodelists should be allowed in comparisons.

From the ABNF, paths that return multiple nodes are not allowed in comparisons. I agree with this, and this PR is not changing it.

From the type system, as described here, functions that return multiple nodelists are allowed in comparisons. I think this is fine with the defined conversions (also see below).

I don't think it's fine as it introduces (more) comparisons with surprising results, such as:

<function returning two nodes> == <function returning three nodes>

<path producing an empty nodelist> == <function returning two nodes>

which both yield true.

To change this, we would need to (one of):
* disallow nodelist functions in comparisons at all

* define a new type that represents a single-node nodelist that _is_ suitable for comparisons

* (there's also a third option)
Disallow nodelist function in comparisons

I'm okay with this. I'm not sure if this would limit any functionality, though. It would limit functions that return NodesType to logicals because the nodelist is still convertible to (or "has meaning in," or however you want to say it) a logical.

The examples I put up for the hypothetical distinct() function were contrived to endure @cabo's tests. I certainly can't think of a practical use case to put a nodelist in a comparison.

This does raise the question of whether to even have NodesType, though. If a function that returns NodesType is only valid as a logical, then it might as well return BooleanType.

Not so. Such functions can be passed as arguments to other functions, e.g. count(distinct($.a.*)) > 1.

Define a new type for single-node nodelists

A function that returns a single-node nodelist would be valid in a comparison, but a function that returns NodesType would not.

This is currently my preferred option.

This is really an extension of disallowing nodelist functions in comparisons, so all of that logic still applies (i.e. is NodesType still useful?).

Conversely, a function that returns the new single-node nodelist type would only be useful in a comparison (otherwise, we run into the same inconsistencies that I've been highlighting this whole time). Because it's only useful in a comparison, it might as well just return the value of the single node, which is ValueType. So why have it?

Again, for passing as an argument to another function, e.g. count(distinct(append(head($.a.*), $.b.*))) > 1.

The third option

Only allow nodelist functions to be parameters for other functions, i.e. disallow them from both logicals and comparisons.

This would only allow something like $[?count(distinct(@.*))==42], which is arguably useful, but both $[?distinct(@.*)] and $[?distinct(@.*)==42] would be invalid.

Whether the spec would be clearer if it avoided type conversions.

We cannot avoid type conversions. They are implicit in the grammar.

To prove this, I'll exclude functions.

All paths return nodelists. They don't return logicals, and they don't return values. Singular paths, specifically, return a nodelist with a single value.

Paths are always allowed to act as logicals. That is, there is a conversion from a nodelist to a logical. Namely, if a nodelist has one or more nodes, it is treated as (converted to) LogicalTrue; if it has no nodes, it is treated as (converted to) LogicalFalse. If I were to write $[?@.a] as a test expression explicitly (in a C-ish syntax), it would be:
$[?(LogicalType)@.a]
where I'm "casting" the nodelist returned by @.a to LogicalType.

Paths are allowed to act as values if they're singular. That is, there is a conversion from a nodelist to a value if the nodelist contains at most a single node. If the nodelist were to contain more than one node, the conversion would be undefined, and a runtime error would occur. To avoid the potential error, the ABNF applies an additional restriction to identify paths which are guaranteed to return at most a single node, and we can catch it at parse time. Writing $[?@.a==42] explicitly would be:
$[?((ValueType)@.a)==42]  // extra parentheses not necessary, but can help readability
where I'm "casting" the nodelist returned by @.a to a ValueType.

We have specified these conversions by instead describing their behavior, but that doesn't detract from the fact that they're there.

Type conversions are unavoidable. We might as well state them. I think it helps us in the end.

Please note that I'm not proposing any change of semantics, merely an editorial change to improve readability. I suggest we defer discussion of the (de)merits of such a change until/unless we have the proposed text before us. If I attempt such a change, I'll take into account the way some readers may think of it in terms of type conversion.

cabo commented 1 year ago

To change this, we would need to (one of):
* disallow nodelist functions in comparisons at all

* define a new type that represents a single-node nodelist that _is_ suitable for comparisons

* (there's also a third option)
Clearly, this needs to be decided, but we should first complete this PR so we have a stable basis for that.

cabo commented 1 year ago

I added issue #404 so we can continue on this after completing #403.

glyn commented 1 year ago

I added issue #404 so we can continue on this after completing #403.

@cabo I'm not sure what you are proposing. From my perspective, #404 represents the only blocker to merging #403.

cabo commented 1 year ago

I added issue #404 so we can continue on this after completing #403.

@cabo I'm not sure what you are proposing. From my perspective, #404 represents the only blocker to merging #403.

In the spirit of continuous improvement, we can merge #403 and work on #404 from that as a stable basis. This is not a "blocker", just another work item.

ietf-wg-jsonpath / draft-ietf-jsonpath-base