owlcs / owlapi

OWL API main repository
821 stars 315 forks source link

Extension of Manchester syntax to cover the whole of OWL2 DL #1050

Closed b-gehrke closed 2 years ago

b-gehrke commented 2 years ago

Hello, My name is Björn, and I'm a B.Sc. student of @tillmo. Currently, there are some constructs like general concept inclusions that aren't expressible in Manchester Syntax. As part of my bachelor thesis, I worked out a proposal to extend Manchester Syntax so that all OWL2 DL constructs can be expressed. You can find the excerpt containing the proposed changes here. This pull request contains the proposed changes. What do you think of this proposal? Do you think this syntax extension has a chance to be appreciated by the OWL community?

Currently there are 2 unit tests failing (RelativeIRIsRountTripTestCase.testFormat, AnnotatedPunningTestCase.runTestForAnnotationsOnPunnedEntitiesForFormat). Both use the same IRI for a data property and object property. In my understanding, it is allowed for different entities to have the same IRI, but it is not allowed that two properties of a different kind have the same IRI (mentioned in the list of new features for OWL2). The latter is the case in both failing cases.

ignazio1977 commented 2 years ago

The code looks ok at first look.

The test failures seem down to the roundtrip picking the wrong format, there might be an issue in parsing that doesn't get picked up.

A separate issue was the test framework not expecting Manchester syntax to allow GCI, which caused it to ignore GCI axioms in input when comparing expected and actual axioms - which then caused it to complain that GCIs were showing up unexpectedly.

I'm wondering if the whole of OWL 2 DL is already included in this patch set - when I remove the restrictions in the tests that exclude Manchester syntax from some tests because it is known not to support those test cases, I get some failures, meaning that there are features supported in all other tested languages but not in this one. Haven't checked yet to determine if those features should be supported or are correctly excluded.

Re acceptance in the community, I don't see why not - I've met with a few bug reports raised by developers expecting full support in all syntaxes, so at least in some cases this patch set fills a gap that has tripped up a few developers.

ignazio1977 commented 2 years ago

This input fails:

Prefix: : <urn:test:>
Ontology: 
AnnotationProperty:      :propA
AnnotationProperty:      :A          Annotations: :propA _:genid2147483648
Datatype:                :A          Annotations: :propA _:genid2147483648
ObjectProperty:          :A          Annotations: :propA _:genid2147483648
DataProperty:            :A          Annotations: :propA _:genid2147483648
Class:                   :A          Annotations: :propA _:genid2147483648
Individual:              :A          Annotations: :propA _:genid2147483648
Individual:              _:genid2147483648

Parsing errors are thrown only if the object and data property pun is present and the class also shares the same IRI. Pun between object and class, object and data property, and data property and class work fine. All three puns are not allowed, especially between property types, however parsing should still be allowed (this is what the other parsers and the previous version of the manchester parser did). I'll debug and amend.

b-gehrke commented 2 years ago

Thank you for your response! Great to hear the changes appear acceptable to you.

Parsing errors are thrown only if the object and data property pun is present and the class also shares the same IRI.

This is explicitly checked in the parser and an exception is thrown if the same IRI is used for all three. ManchesterOWLSyntaxParserImpl.java:700. Removing this check lets the first test pass.

Still, the following fails to parse:

Prefix: : <http://www.semanticweb.org/owlapi/test/>

Ontology: <http://www.semanticweb.org/owlapi/test>
AnnotationProperty: :propA
AnnotationProperty: :propP
ObjectProperty: :p
DataProperty: :p
Class: :p only xsd:boolean
    SubClassOf: 
            Annotations: :propA "x", 
                         :propP: "y"
        :p some owl:Thing

I suspect this is because the parser first checks if :p is an object property (ManchesterOWLSyntaxParserImpl.java:723) and therefore proceeds to parse an object restriction where in this case it should parse a data restriction.

The trivial solution I see, is checking if the token is an object property and a data property at the same time. If so, try parsing an object restriction and catch a potential error. In the catch clause, parse a data restriction.

Another solution might be to lookahead tokens until the ambiguity can be resolved.

ignazio1977 commented 2 years ago

This is explicitly checked in the parser and an exception is thrown if the same IRI is used for all three.

Yes, it looks like the check is not enforced on all routes from parsing. If entities are not used inside a class expression, the check is lost - which is why the test worked beforehand. I'm wondering which of the two is the bug.

The trouble with adding the check everywhere is what impact this will have on existing ontologies - they violated language restrictions but were allowed to be parsed, something that a proper check would stop. Perhaps it will be necessary to add a parse option, e.g., tie this to strict/lax parsing.

b-gehrke commented 2 years ago

I think parse options for strict/lax parsing are a good idea. For now I would suggest we stick with lax parsing and allow punned entities and create an issue to introduce the option and strict parsing. This might allow ontologies, which currently cannot be parsed because of this check, to be parsed. But I think this is more acceptable than prohibit currently parsable ontologies.

ignazio1977 commented 2 years ago

Yeah I just tried this yesterday evening and the tests are successful. I'll need to give it a quick extra check then merge it.

ignazio1977 commented 2 years ago

Merged