owlcollab / oboformat

Automatically exported from code.google.com/p/oboformat
5 stars 2 forks source link

Qualifier syntax is ambiguous for IRI identifiers and inconsistent #148

Open althonos opened 5 months ago

althonos commented 5 months ago

Ambiguous qualifiers

Inside qualifier lists, there is an ambiguity with the current syntax.

For instance, inside qualifiers, which are produced with the following rule:

Qualifier ::= Rel-ID '=' QuotedString 

the Rel-ID is also allowed to contain an = sign at the end (if being produced by the Unprefixed-ID rule), so a greedy parser cannot parse the following:

{minCardinality=1}

(this has to be treated weirdly in fastobo).

I would suggest removing the OboChar rule, and have two rules in the syntax, one for producing the identifiers (and using the syntax from the SPARQL PN_LOCAL and PNAME_LN terminals); and one for producing the unquoted strings (and allowing most characters except { or ! which would need to be escaped to avoid ambiguity with the EOL rule):

Abbreviated-ID := PN_LOCAL
Prefixed-ID := PNAME_LN

This makes the syntax for the abbreviated and prefixed identifiers similar to the one of the OWL Manchester, and is more restrictive in terms of what an identifier can contain.

Even with that change, an IRI can still contain = as a subdelimiter, so this doesn't solve the problem, e.g. from ncit.obo:

def: "A practiced and regimented skill or series of actions." [] {http://purl.obolibrary.org/obo/NCIT_C16847="NCI"}

the URL part is still ambiguous.

Inconsistency

Qualifier lists are the only places where the equal sign = is used; in xref lists or property values, the value is only separated by a whitespace from the annotation property. Since whitespaces are not IRI characters, this also fixes the problem from above. This would change syntax from:

Qualifier ::= Rel-ID '=' QuotedString 

to

Qualifier ::= Rel-ID {WhiteSpaceChar} QuotedString 

which concretizes for instance to:

{http://purl.obolibrary.org/obo/NCIT_C16847 "NCI"}

in the example above, and can be parsed without ambiguity.