jmespath-community / jmespath.spec

JMESPath Specification
6 stars 3 forks source link

Reconcile JEP-12 and raw-string grammar rule #133

Closed springcomp closed 1 year ago

springcomp commented 1 year ago

While JEP-12 Raw String Literals has been accepted, it has subsequently fallen behind and no longer matches the official grammar.

For instance, this test/literal.json test is not compliant with JEP-12 that does not allow backslash characters explicitely. On the other hand, it is not compliant with the official grammar either because \\ is expected to be a valid escape sequence for a single \ character.

https://github.com/jmespath-community/jmespath.test/blob/aa6fb5f942e852af1781d1f208c4069a15794a80/tests/literal.json#L194-L196

Please, consider updating JEP-12 to align with the grammar and the updated tests.

gibson042 commented 1 year ago

https://github.com/jmespath-community/jmespath.test/pull/11#discussion_r1018184034

jmespath.spec should be updated to document the mode switch with an optional (or required but associated with runtime syntax errors) json-value =/ *( json-unescaped / json-escaped ) grammar production [possibly abstracting *( json-unescaped / json-escaped ) as something like json-string-contents] and a section explaining the consequences of an implementation decision regarding if or when to support it. But note also that implementation support for the legacy behavior isn't even internally consistent (e.g., jmespath/jmespath.js#90 ).

springcomp commented 1 year ago

I do not understand you point regarding a mode-switch.

In my mind, the current JMESPath Community specification aligns with the current official specification. None of them consider legacy string literals expressed without double quotes as standards-compliant.,

Please, can you elaborate?

gibson042 commented 1 year ago

None of them consider legacy string literals expressed without double quotes as standards-compliant

I'm not sure what it means to be considered "standards-compliant", but at least jmespath.py and jmespath.js and jmespath.rb parse `foo` into the string "foo", which is not allowed by the current contents of jmespath.spec (the very problem described by this issue). Meanwhile, other implementations such as go-jmespath and jmespath.rs appear to reject such syntax as invalid. And JEP-12 Impact supports both behaviors ("Implementations MAY choose to support the old syntax of allowing elided quotes in JSON literal expressions... In order to support this type of variance in JMESPath implementations...").

So resolution of this issue should not prohibit support for the fallback behavior, nor should it require support. And I see two reasonable options for how to specify that variability:

  1. Add a grammar rule that is itself optional, along with associated behavior requirements such as whitespace trimming.
  2. Add a required grammar rule, along with associated behavior requirements that are theirselves implementation-dependent.
springcomp commented 1 year ago

I agree that:

(...) resolution of this issue should not prohibit support for the fallback behavior, nor should it require support [for the fallback behavior].

But I fail to see why there should be any change to the spec at all.

With regards to literal expressions, both the official and community specifications are virtually identical. They both require a string to be expressed using double-quotes and never mention the syntax with elided quotes.

I refer to syntax that follow those specifications as standards compliant.

I think we can both agree that JEP-12 is the proposal that ultimately changed the specification from whatever it was before to what it now states.

Of course, my view is that the implementations you refer to predate JEP-12 and therefore already supported legacy literal strings. I think it was reasonable for James to design a migration path.

The JMESPath.Net implementation, on the other hand, never supported the old syntax as I was not even aware of this old syntax from reading the spec when I started implementing it.

In my view, JEP-12 supports both syntaxes in the sense that it allows for a migration path. However I'm absolutely convinced that, in order to stay standards compliant, an implementation absolutely has to support the new - as currently specified - syntax.

I think it's perfectly valid for an implementation to choose not to support syntax that is not specified.

Basically, I think about the deprecation process that James started. At the end of the deprecation process, my view is that:

springcomp commented 1 year ago

To be clear this issue is not about updating the spec. Instead it is about updating JEP-12 to align with the spec.

Although I'm not comfortable changing a JEP once accepted, we can both agree that the current JEP-12 does not reflect the current official spec. You already proved that and pointed to subsequent commits to the spec.

This suggests a missing JEP-12.5 or, an addendum to JEP-12.

gibson042 commented 1 year ago

I agree that:

(...) resolution of this issue should not prohibit support for the fallback behavior, nor should it require support [for the fallback behavior].

But I fail to see why there should be any change to the spec at all.

Right now, the spec does prohibit support for the fallback behavior, because e.g. `foo` is not covered by its grammar. And I think it is very unwise to have specification text that characterizes the overwhelming majority of purported implementations as noncompliant or allows them to arbitrarily accept invalid input.

springcomp commented 1 year ago

Right now, the spec does prohibit support for the fallback behavior, because e.g. `foo` is not covered by its grammar.

In a way you are right, but I’d rather think about it differently:

And I think it is very unwise to have specification text that characterizes the overwhelming majority of purported implementations as noncompliant or allows them to arbitrarily accept invalid input.

Likewise, I would rather qualify legacy syntax with elided double quotes as being undefined input rather than invalid input.

For a minute let’s forget about this community project and focus on the current situation.

The official specification at jmespath.org does not specify legacy syntax. As you’ve outlined earlier in this issuer, go-jmespath, jmespath.rs and jmespath.net are fully compliant and do not attempt to support legacy syntax.

However, jmespath.py and jmespath.js and jmespath.rb implementations do parse the legacy syntax. We now understand that this is as per JEP-12.

JEP-12 must be understood as initiating a deprecation process that may eventually lead to legacy syntax being abandoned. You can view those implementations as violated standards compliance with respect to the official specification – which I sort of agree with – but as there is not compliance tests that mandate such syntax to raise a syntax error, this is not conclusive.

Basically, the current compliance test suite does not contain support for the legacy syntax. The Python implementation does contain support for legacy syntax in a legacy folder so not officially part of the main compliance test suite. This is in line with JEP-12’s recommendations to remove legacy syntax from the test suite.

However we appreciate the situation, if we think about parsing legacy syntax as being non-compliant, then jmespath.py, jmespath.js, and jmespath.rb have been non-compliant with respect to the official specification for more than five years now.

The objective of this community effort is to draw a line and fully deprecate legacy syntax. We do not want to bring back those implementations as being standards compliant. As part of the deprecation process I’m taking the following steps:

That is why I’m rather opposed to updating the grammar specifically related to legacy syntax.

gibson042 commented 1 year ago

Right now, the spec does prohibit support for the fallback behavior, because e.g. `foo` is not covered by its grammar.

In a way you are right, but I’d rather think about it differently:

  • Unless explicitely mentioned, the specification does not prohibit anything. It merely states what MUST be supported. The specification should be studied concurrently with the compliance test suite.

Ah, I would disagree very strongly with that. If implementations are not required to reject use of syntax not covered by the formal grammar, then multiple problems emerge:

Likewise, I would rather qualify legacy syntax with elided double quotes as being undefined input rather than invalid input.

If it is undefined, then how is anyone to know whether e.g. ` foo ` == 'foo' should be true or false in implementations where it is not rejected?

JEP-12 must be understood as initiating a deprecation process that may eventually lead to legacy syntax being abandoned.

I support this, but "deprecated" does not mean "completely removed"—it is instead a transitional step towards that. The process should maintain a definition for correct behavior while making the behavior itself optional and discouraged.

The objective of this community effort is to draw a line and fully deprecate legacy syntax. We do not want to bring back those implementations as being standards compliant.

Doesn't this contradict your claim above? If you want those implementations to be considered noncompliant right now, then it seems like you're claiming the deprecation process to already be complete.

As part of the deprecation process I’m taking the following steps:

  • The compliance test suite will contain tests that mandate syntax error to be raised when using legacy syntax. As a courtesy, and following JEP-12’s recommandations, you have convinced me to isolate those tests in a JEP-12 specific folder, in hope to be a good citizen with respect to implementations that still want to support the legacy syntax going forward.
  • The Python implementation will no longer parse legacy syntax automatically. Instead, the support is there by opting-in using a flag. This is again to be a good citizen and promote use of this new implementation as much as possible.

That is why I’m rather opposed to updating the grammar specifically related to legacy syntax.

Thank you, this perspective is very helpful. I would say that documenting the discouraged syntax and its corresponding semantics in the specification would be at minimum a similar level of good citizenship, but my opinion is actually a bit stronger than that. I suspect that use of the fallback is still so prevalent that any implementation is effectively required to be aware of it, even if the choice is ultimately made to reject it, and therefore absence from the specification is detrimental. What was your experience in developing the JMESPath.Net implementation—how did you become aware of the legacy syntax, and what did you decide to do upon discovering it? If I'm reading LiteralStringToken.cs correctly, it does reject attempts to use it.

springcomp commented 1 year ago

What was your experience in developing the JMESPath.Net implementation—how did you become aware of the legacy syntax, and what did you decide to do upon discovering it?

I became aware of the legacy syntax by users trying to replicate some expressions using it with jmespath.net. When first reading about JEP-12, I understood that syntax was being deprecated and chose never to support it.

I support [that JEP-12 must be understood as initiating a deprecation process], but "deprecated" does not mean "completely removed"—it is instead a transitional step towards that. The process should maintain a definition for correct behavior while making the behavior itself optional and discouraged.

I’m completely aligned with that. In fact, the Python implementation was raising a PendingDeprecationWarning all along. Our first iteration of the community spec will be the point at which the syntax is fully deprecated, thereby completing the process.

I would say that documenting the discouraged syntax and its corresponding semantics in the specification would be at minimum a similar level of good citizenship, (…) and therefore absence from the specification is detrimental.

I would not be opposed to documenting – as in, adding some notes to the specification text – the very existence of a legacy syntax. Perhaps outlining the deprecation process and such. Basically, an explicit reference to JEP-12 were users would find the original portion of the specification grammar for the legacy syntax there.

Ah, I would disagree very strongly with [the specification not prohibiting anything unless exlicitely mentioned]. If implementations are not required to reject use of syntax not covered by the formal grammar, then multiple problems emerge.

Well, I think that you are technically right, but that is an impossible ideal to attain. Before JEP-12, the grammar was ambiguous to begin with. Even after JEP-12, implementations vary in subtle details as you have outlined. So one has to study the compliance test suite as a secondary viewpoint for interpreting the specification.

Take this PR for instance. With expressions like `null`.@ and `null` | @. The specification is crystal clear about how to evaluate a sub-expression in pseudo-code:

left-evaluation = search(left-expression, original-json-document)
result = search(right-expression, left-evaluation)

Then later in the text, pipe-expressions are specified as being similar to sub-expression. It’s only by looking at the compliance tests that one has to understand the true meaning as being:

left-evaluation = search(left-expression, original-json-document)
if left-evaluation is `null` then result = `null`
else result = search(right-expression, left-evaluation)

However, there is no compliance tests that contradicts the original description about evaluating a pipe-expression. So this is a loophole that can be exploited in a sense that, I hope, is more intuitive and helpful.

To summarize, I think we broadly agree although maybe not in details. I will open a separate issue to mention / document the existence of legacy syntax and refer to JEP-12.

Now focusing on this issue, however, what would be your position for best reconciling JEP-12 and the specification? We have a few options:

Do you see others?

gibson042 commented 1 year ago

I would prefer a new JEP that obsoletes JEP-12 and is linked to from JEP-12, which is basically how the IETF RFC process handles normative changes (the errata system is for fixing typos and other editorial issues, but this seems more substantial than that).