w3c / shacl

SHACL Community Group (Post-REC activitities)
31 stars 5 forks source link

SHACL-C: analyze EBNF grammar complexity #67

Open VladimirAlexiev opened 1 month ago

VladimirAlexiev commented 1 month ago

Followup to #52:

SPARQL is LL(1), which is simple in tech terms. It is compatible with LALR(1) there are parser generators for every programming ecosystem freely available.

@parrt @svenefftinge @zaach @lolo101 @tunnelvisionlabs Can you help us here? Your names and github nicks were recommended by chatGPT.

afs commented 1 month ago

Apache Jena's SHACL-C grammar is LL(1) using JavaCC. There is a tokenizer (longest match wins) and an LL(1) grammar parser.SPARQL, No special features of JavaCC are used (e.g. no local lookahead of 2 for example).

SHACL-C, SPARQL and Turtle call out the terminals separately from the grammar rules.

LL(1) has one big advantage - better error messages by default!

The SPARQL grammar in the SPARQL spec is LL(1) because it is produced from JavaCC. The SPARQL is large enough that a tool to manage and check it is essential. It is too easy to include things that weren't intended. In practice, there are points where strict LL(1) is potentially inefficient. Jena has alternative definitions that are rewritten (these are not in the input to the spec HTML production).

So I suggest for a spec, write for LL(1) to get widest coverage and identify any points where it deviates from that and justify them.

The RDF 1.2 Turtle recently had a proposal that requires a lookahead of 2 and left factoring the graph to remove the requirement made the grammar ugly. in teh end it wasn't necessary.

BNF in specs has to communicate as well as being a formal definition. Requiting implementers to modify the spec structure for their local tools risks weaker compliance.