add typed `function-expr`s to abnf

gregsdennis commented 1 year ago

I was thinking the type system for functions could be reinforced somewhat using the ABNF.

This PR splits function-expr into three additional names that all just reference back to function-expr:

value-function-expr
logical-function-expr
nodes-function-expr

I recognize that it's basically a no-op in the ABNF itself, but it gives us names that we can use in the text. We do this elsewhere with rules like start, end, and step to define a slice selector; each of these all just reference int, but having them separate is handy.

Now that these have been added to the ABNF and have specific usages, we can reference them in the text to define them. This prevents the need to use comments to specify what kind of function is valid.

We could go further and explicitly state which ABNF rule each function matches, but I don't think that's necessary.

There may be other places in the text where it says something like "a function with a result of XType," but I couldn't find any.

cabo commented 1 year ago

We had this discussion already (I need to find it).

It is a total non-starter to break the ABNF by requiring the parser to intersperse semantic processing into the parser.

gregsdennis commented 1 year ago

@cabo please explain why this isn't a good idea. It seems perfectly beneficial to me.

cabo commented 1 year ago

On 2023-04-20, at 07:19, Greg Dennis @.***> wrote:

@cabo please explain why this isn't a good idea. It seems perfectly beneficial to me.

https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/pull/416#discussion_r1125739805

The main argument was that these changes "need a painful hand-written parser with a lexer that assigns different syntactical categories based on a symbol table lookup, like in a C compiler. I'd like to stick with ABNF.”

Programming languages can be divided into those that can simply be parsed by a parser generator and those that require elaborate hand-crafting to intermingle semantic processing (symbol tables etc.) with parsing. C and C++ are the latter. JSONPath is the former. I’d like it to stay in that class; absolutely no benefit is large enough to justify the damage that requiring this hand-crafting does.

Grüße, Carsten

glyn commented 1 year ago

@cabo makes a good point - it is highly desirable to be able to feed the spec ABNF directly into a parser generator and get a parser that can decide how to parse any given query. So it's probably best to stick with the current ABNF.

gregsdennis commented 1 year ago

The main argument was that these changes "need a painful hand-written parser with a lexer that assigns different syntactical categories based on a symbol table lookup, like in a C compiler. I'd like to stick with ABNF.”

I don't understand why this needs more than the ABNF while what's currently in the document doesn't. All I did was add some interim names so they could be referenced in the text.

As I mentioned in the opening comment, we already do exactly the same thing with the slice definition's start/end/step only referring to int.

Programming languages can be divided into those that can simply be parsed by a parser generator and those that require elaborate hand-crafting to intermingle semantic processing (symbol tables etc.) with parsing. C and C++ are the latter. JSONPath is the former.

First, parsing is one thing, but to get any meaning or functionality out of what you parsed, you need that "elaborate hand-crafting."

Secondly, why do you think we need symbol tables, etc, for this change but not without it?

gregsdennis commented 1 year ago

@cabo, also note that my approach this time is inverted. Instead of having the syntax all point to function-name where it splits out afterward (a definite problem), I'm now performing the split before function-expr. The funnel is inverted.

Last time:

                                            / --> value-function-name
expr --> function-expr --> function-name -------> logical-function-name
                                            \ --> nodes-function-name

There's no syntactical validation actually occurring here because regardless of context, you can still have any of the function names.

This proposal does actually have syntactical validation:

         / --> value-function-expr -----\
expr  -------> logical-function-expr ------> function-expr
         \ --> nodes-function-expr -----/

where each of the typed functions may only appear in their respective contexts.

glyn commented 1 year ago

@gregsdennis The syntax being proposed here is undecidable. Can we close the PR?

gregsdennis commented 1 year ago

The syntax being proposed here is undecidable

What do you mean by "undecidable?"

glyn commented 1 year ago

The syntax being proposed here is undecidable

What do you mean by "undecidable?"

I mean there is no way for a parser to decide how to parse certain pieces of syntax. For example, suppose the parser is parsing a test-expr consisting of a function-expr. How can the parser decide between logical-function-expr and nodes-function-expr?

test-expr           = [logical-not-op S]
                     (filter-query          / ; existence/non-existence
                      logical-function-expr /
                      nodes-function-expr)

cabo commented 1 year ago

https://en.wikipedia.org/wiki/Ambiguous_grammar

gregsdennis commented 1 year ago

It has to know which the function represents by its name. It has to know the functions return type to determine well-typedness of the function's parameters anyway, so this is already available information.

For instance, when the parser encounters search, it should know that the search() function is a logical-function-expr, just as it knows the function requires two parameters.

cabo commented 1 year ago

It has to know

https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/pull/466#issuecomment-1515742262

gregsdennis commented 1 year ago

Okay, but without that knowledge, a parser will tell me that something like search(1,2,3,4,5) is valid. That is clearly invalid and not useful to anyone.

cabo commented 1 year ago

Okay, but without that knowledge, a parser will tell me that something like search(1,2,3,4,5) is valid. That is clearly invalid and not useful to anyone.

Semantic analysis will tell you it isn't well-typed (and therefore not valid).

You gain nothing from trying to push this into the syntactic analysis (the part that is described by the ABNF).

(If you want, you can use the term "parser" to include semantic analysis, and then we both win.)

gregsdennis commented 1 year ago

What use is knowing a path (candidate) is syntactically valid but not semantically valid? You always want to know both.

You gain nothing from trying to push this into the syntactic analysis (the part that is described by the ABNF).

You gain earlier detection. I do both syntactic and semantic validation simultaneously (because you always need both), but if you do them separately, this enables the syntactic validation to fail and you don't even have to start the semantic validation.

gregsdennis commented 1 year ago

https://en.m.wikipedia.org/wiki/Fail-fast

cabo commented 1 year ago

I don't think these "gains" are worth anything. As you point out, an implementation can still mush up syntactic with semantic processing. Having a clear separation between them enables use of tools for syntactic processing. Giving up that separation does nothing to help implementation, it only makes it worse for environments that do have tools.

gregsdennis commented 1 year ago

Having a clear separation between them enables use of tools for syntactic processing.

Of what use is syntactic validation by itself? Syntactic validation will say search(1,2,3,4,5) is valid. This is not useful, so why enable it?

danielaparker commented 1 year ago

Of what use is syntactic validation by itself? Syntactic validation will say search(1,2,3,4,5) is valid. This is not useful, so why enable it?

By itself? Well, you detect that there is a function, and you obtain its name, its individual arguments, or report a syntax error. That's useful.

Semantic validation augments that by verifying that the function name maps to a supported function, the number of arguments matches the expected arity, the return type is valid output, and the types of the arguments are a valid combination of inputs for the function.

In a hand written parser (my understanding is that most JSONPath parsers that implement their own expression language are hand written, certainly all the important ones) semantic validation can be mixed with syntax validation, and allow early detection of unsupported function names, invalid arity, and type errors. But it need not be, it's an implementation choice.

Similar considerations apply to parsing and validating operators.

gregsdennis commented 1 year ago

Well, you detect that there is a function, and you obtain its name, its individual arguments, or report a syntax error. That's useful.

But how is it useful? You saying that it is doesn't make it so.

A syntactically valid but semantically invalid path is no more useful than a syntatically invalid path. Neither can be evaluated. Especially from a user's point of view, if you have an IDE that can provide hints (e.g. intellisense and/or completion), it doesn't make a difference whether the ABNF is violated or the path just doesn't make sense.

glyn commented 1 year ago

Well, you detect that there is a function, and you obtain its name, its individual arguments, or report a syntax error. That's useful.

But how is it useful? You saying that it is doesn't make it so.

A syntactically valid but semantically invalid path is no more useful than a syntatically invalid path. Neither can be evaluated. Especially from a user's point of view, if you have an IDE that can provide hints (e.g. intellisense and/or completion), it doesn't make a difference whether the ABNF is violated or the path just doesn't make sense.

I think the crucial point is that the ambiguity in the ABNF means that it is not possible to automatically generate a valid/useful parser from the ABNF. We want to keep open the implementation option of automatically generating a parser from the ABNF.

gregsdennis commented 1 year ago

I don't see the need for the distinction.

danielaparker commented 1 year ago

Well, you detect that there is a function, and you obtain its name, its individual arguments, or report a syntax error. That's useful.

But how is it useful?

It produces the tokens to which semantic validation is then applied. That can be later, after an entire parse tree of tokens has been produced, or incrementally in a hand written parser.

You saying that it is doesn't make it so

Indeed, but you can also read about it in the Dragon Book. Or the Crafting Compilers book.

A spec writer has some leeway where to put validation, in the ABNF or in the semantic validation. For example, the precedence and associativity rules for operators can be defined in the ABNF, or the spec can simply provide an operator mapping to precedence and associativity rules to be applied in semantic validation. But generally, it's about the separation of concerns, on the one hand, scanning for tokens, and on the other, applying meaning to them.

Daniel

glyn commented 1 year ago

Thanks for closing this PR @gregsdennis.

I don't see the need for the distinction.

For the record, which distinction are you referring to?

ietf-wg-jsonpath / draft-ietf-jsonpath-base

add typed `function-expr`s to abnf #466