opengeospatial / ogcapi-features

An open standard for querying geospatial information on the web.
https://ogcapi.ogc.org/features
Other
335 stars 83 forks source link

Simplify the cql2-text grammar (future version improvements?) #705

Open jerstlouis opened 2 years ago

jerstlouis commented 2 years ago

This is feedback from trying to implement cql2-text. Implementers (or at least us) face struggles with the current grammar.

I think it comes down mainly to these two things:

  1. Some of the capabilities from extension conformance classes are defined as separate rules. I think it would be much easier to simply define new possible values for operators or pre-defined functions identifiers (using the same grammar rule as function calls) for operators using a function call syntax (i.e., array/spatial/temporal operators and predicates). This would cut down the number of rules dramatically, and I think would also allow to make the requirements in each conformance class clearer.
  2. Some rules seem to exist only to restrict the data types (e.g., numericExpression, characterExpression, temporalExpression...). However, this is purely a runtime concept, since the data type that a certain expression (e.g., a property) will evaluate to will depend on the queryables. Therefore I would not have used grammar rules (which are about the syntax) to make this distinction. Instead, I think what is needed for this is to have requirements and/or permissions that specify the interpretation if an unexpected data type is used in such a context.

I think simplifying these two aspects of the grammar would directly result in simpler parser implementations, greater ease of implementation and greater interoperability.

cportele commented 2 years ago

Meeting 2022-06-20: It would be good to understand why this would result in an easier implementation. We need to discuss this in a meeting when @jerstlouis is present.

jerstlouis commented 2 years ago

Thanks @cportele . I should be attending the next meeting in a couple weeks.

As a summary, from a syntactic point of view, I think the two things I suggested above would result in fewer grammar rules (simpler grammar), and parser node classes would be a more direct / natural match to the rules. We would implement the function/operator name validation / data types checking separately from the parsing, since some of it is only known at runtime (e.g., available functions, queryable data types). e.g., in our implementation we have a CQL2CallExp node class which we plan to use to handle the array / spatial / temporal operators which syntactically look like function calls. We are hand-writing a Recursive Descent parser, borrowing heavily from our ECCSS/CMSS parser.

jerstlouis commented 2 years ago

The following excerpt from our internal CQL2 design document mapping CQL2 conformance classes and providing a concise summary of the CQL2 syntax might be insightful. A simpler grammar could potentially closely match those CQL2* AST node classes to rules. We could eventually prototype such a simpler grammar together with railroad diagrams demonstrating the idea.

Basic CQL2

Property-Property

Arithmetic Expressions

Advanced Comparison Operators

Functions

Case-insensitive Comparison

Accent-insensitive Comparison

Basic Spatial Operators

Spatial Operators

Temporal Operators

Array Operators

@pvretano

jerstlouis commented 2 years ago

See first draft of proposed simpler grammar rules in https://github.com/opengeospatial/ogcapi-features/issues/723#issuecomment-1172603159.

jerstlouis commented 2 years ago

Note that in the approach I suggest in defining the grammar production rules, operators / functions are not really keywords, but regular identifiers used in function call expressions (or spatial/literal/array literals definitions using same syntax as function calls). For example, this means that a date or s_intersects queryable would not require to be double-quoted (as in the current abstract tests), since date would only take its meaning of a temporal literal when it is followed an opening parenthesis (, and therefore there really is no ambiguity to date<>DATE('2022-04-16').

In my opinion this makes it much easier to extend the language with additional functions / operators, since those additions would not introduce additional keywords that break implementations not previously requiring queryables with the same name to be double-quoted. The list of keywords in 8.2 (which would need to be double-quoted, if allowed at all) would be reduced to:

All of the other ones would get tokenized by the lexer as an identifier which can be used as operators/function calls, or to define literals and only get resolved in the contexts where they apply. This is the approach taken in C-like languages where standard functions and data types/structs (or classes in C++) are not classified as keywords.

Also note that SQL keywords (or "reserved" words) do not seem to include any function-like keywords either. Things like UPPER() changing case are described as functions instead.

jerstlouis commented 5 months ago

See the CartoSym-CSS BNF lexer / grammar for ANTLR4 which should (in theory) be a true superset of CQL2:

https://github.com/opengeospatial/styles-and-symbology/blob/main/core/schemas/CartoSym-CSS-Lexer.g4

https://github.com/opengeospatial/styles-and-symbology/blob/main/core/schemas/CartoSym-CSS-Grammar.g4

The starting rule for CQL2 is expression (e.g., you can paste the Lexer and Grammar at http://lab.antlr.org/ and test any CQL2 expression with expression as the start rule).

When I have a chance I will extract only the CQL2 relevant part.