qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Potential (low-risk) Ambiguities in XPath EBNF #1050

Closed johnlumley closed 7 months ago

johnlumley commented 8 months ago

After demonstrating iXML XPath grammar production at the meeting of 27th Feburary, it seemed worth recording some of the ambiguity issues encountered, if only so others might be aware of possible pitfalls.

Please note that the Lexical Structure notes in the spec do resolve these ambiguities, by extra-grammatical interpretations, most notably the choice of longest conforming match, but for grammar/parsers which don't specify or support this, such as InvisibleXML, ambiguities might arise, though there may be ameliorating changes to the resulting grammar that will resolve them. I am not advocating changes to the specification EBNF but merely noting where such problems might occur from my implementation experience, and potentially suggesting some workarounds.

Here are a couple of cases:

TypeName / AtomicOrUnionType

The rule for ItemTypeis ~

ItemType ::= ... TypeName| .... | AtomicOrUnionType | ...

where both TypeNameand AtomicOrUnionType resolve solely to the EQNameproduction. The grammar interpretation notes suggests (I think) that it binds to TypeName if such exists in the current static context, which is an extra-grammatical concept, but I may be mistaken.

StringTemplate

The productions for StringTemplate are:

[106]       StringTemplate               ::=    "`" (StringTemplateFixedPart | StringTemplateVariablePart)* "`" 
[107]       StringTemplateFixedPart      ::=    ((Char - ('{' | '}' | '`')) | "{{" | "}}" | "``")*
[108]       StringTemplateVariablePart   ::=    EnclosedExpr 

where it relies on longest match semantics to avoid ambiguity. (If this was not the case a potential infinity of empty StringTemplateFixedPart productions could be satisfied, or any sequential partitions of a sequence of characters.)

An alternative (recursive and more cumbersome) formulation, which avoids the ambiguity is (in an iXML grammar for compactness):

               StringTemplate: -"`", StringTemplateContent?, -"`".
       -StringTemplateContent: StringTemplateFixedPart |
                               StringTemplateVariablePart |
                               StringTemplateVariablePart, StringTemplateContent |
                               StringTemplateFixedPart, StringTemplateVariablePart, StringTemplateContent?.
      StringTemplateFixedPart: ("{{"; "}}"; "``"; ~["`{}"])+.

StringTemplateVariablePart remains unchanged. (iXML doesn't support character set subtraction, so ~["``{}"] (any character except...) is used for the Char - .... term.) By allowing a fixed part only to be followed by a variable part, this effectively permits the content either to be empty, or a sequence of parts such that StringTemplateVariablePart terms can be consecutive, but not StringTemplateFixedPart and it seems to work effectively, at least in my iXML parser.

Reactions, corrections, remarks, praise and brickbats welcome. I'll document any more as I find them. John

johnlumley commented 4 months ago

2024jul01 - An additional ambiguity occurs in one of the deep lookup examples:

$tree ??$from ??type(record(to, distance))[?to=$to] ?distance

which can be simplified to

$tree ??type(foo)

where there is ambiguity between a LookupExpr with TypeQualifier and a DynamicFunctionCall on a function named type. That is, type should be one of the restrictions on function name to avoid this ambiguity.