protegeproject / protege

Protege Desktop
http://protege.stanford.edu
Other
971 stars 229 forks source link

Some ontologies cannot be opened with Protege 5.6 #1096

Closed matentzn closed 1 year ago

matentzn commented 1 year ago

this does not work for me for https://github.com/EnvironmentOntology/envo/blob/master/envo.owl:

  1. Start protege
  2. File>Open, select file, ok

@gouttegd has confirmed that this does not work.

My best guess - the OWLAPI while cycling through the parsers believes the RDFXML format is actually some other format (because it does not recognise it anymore?)

@gouttegd believes it should be Protege related, because such issues should have been caught by the protege test suite.

Note: ENVO had malformed RDFXML in the past.

gouttegd commented 1 year ago

Well, it might be a OWL API issue in the end…

Rebuilding Protégé-5.6.0-beta-2 with OWL API 4.5.9 (the same version as the one used in Protégé 5.5.0) does allow to load envo.owl.

Further observations:

ERROR Input ontology contains 1 triple(s) that could not be parsed:
 - <https://www.wikidata.org/wiki/Q2306597> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> _:genid-nodeid-node1gn2dm30qx15895.

This seems to be the offending bit in envo.owl:

<owl:NamedIndividual rdf:about="https://www.wikidata.org/wiki/Q2306597">
        <rdf:type>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/ENVO_00000248"/>
            </owl:Restriction>
        </rdf:type>
</owl:NamedIndividual>

Removing that rdf:type statement does not allow Protégé 5.6.0 to load the file, but it does allow ROBOT to successfully convert it to Functional Syntax.

yields an empty ontology without throwing any exception:

Ontology(OntologyID(Anonymous-2)) [Axioms: 0 Logical Axioms: 0] First 20 axioms: {}
Prefix(owl:=<http://www.w3.org/2002/07/owl#>)
Prefix(rdf:=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
Prefix(xml:=<http://www.w3.org/XML/1998/namespace>)
Prefix(xsd:=<http://www.w3.org/2001/XMLSchema#>)
Prefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)

Ontology(
)% 
gouttegd commented 1 year ago

OK, the real offending bit in envo.owl is here (line 109546):

<oboInOwl:hasDbXref rdf:resource="pharmacy. (n.d.) American Heritage® Dictionary of the English Language, Fifth Edition. (2011). Retrieved August 11 2020 from https://www.thefreedictionary.com/pharmacy"/>

So I guess the RDF/XML parser in the newest OWL API is stricter than in the older version and (correctly) rejects the file. But then I would expect to get an error rather than an empty ontology.

gouttegd commented 1 year ago

So here’s my understanding of the problem:

With OWL API 4.5.9 (Protégé 5.5.0), the RDF/XML parser does not barf on the incorrect rdf:resource statement above, so the ontology is successfully loaded.

With OWL API 4.5.24 (Protégé 5.6.0-beta-2), the RDF/XML parser is stricter (which is ultimately a good thing!), it does barf when it encounters the incorrect statement, and it reports that it failed to load the ontology. The OWL API then goes on with the other parsers available (standard behaviour of the OWL API: trying all parsers until one reports that it successfully loaded something). At some point it tries the “Trix” parser, but then the TriX parser incorrectly reports that it successfully loaded the ontology without any error even though it yields an empty ontology.

I don’t think there is much that Protégé can do here, apart maybe from banning the TriX parser (I’d argue that we would be somewhat justified in doing so). Ultimately that parser needs fixing in OWL API – it should detect that it has not been able to parse anything rather than returning an empty ontology while reporting success.

gouttegd commented 1 year ago

Of note, banning the TriX parser would not allow Protégé to load the bogus envo.owl file. But it would allow it to properly report the parsing error with the RDF/XML parser, which I believe is the right thing to do.

matentzn commented 1 year ago

@ignazio1977 what do you think?

matentzn commented 1 year ago

Excellent analysis @gouttegd thank you!

ignazio1977 commented 1 year ago

@matentzn TriX again :-( I think the parser is banned by default in owlapi 5. You can ban it manually in version 4, setting an environment variable or a preference file. Should be explained in ConfigurationOptions javadoc

matentzn commented 1 year ago

OK I assume you refer to http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/parameters/ConfigurationOptions.html#BANNED_PARSERS; we are not to worried then about the slightly more draconian parsing yeah? The fact that ontologies that previously opened due to permissive parsing not opening any more?

@gouttegd I assume removing <oboInOwl:hasDbXref rdf:resource="pharmacy. (n.d.) American Heritage® Dictionary of the English Language, Fifth Edition. (2011). Retrieved August 11 2020 from https://www.thefreedictionary.com/pharmacy"/> fixes the ENVO issue?

ignazio1977 commented 1 year ago

Yes I'm referring to the BANNED_PARSERS property.

The RDF/XML parser is working as expected, I believe - the change is due to bug fixes.

The empty ontology is a side effect - when all else fails, TriX rarely refuses to parse anything; but, when the input is not in TriX format, it returns an empty ontology, giving the caller the impression the OWLAPI thought the ontology is correct, and empty.

I keep meaning to ban TriX from the list of parsers in version 4 as well but I keep forgetting.

matentzn commented 1 year ago

OK, thank you!

gouttegd commented 1 year ago

I assume removing fixes the ENVO issue?

Yes.

we are not to worried then about the slightly more draconian parsing yeah? The fact that ontologies that previously opened due to permissive parsing not opening any more?

Stricter parsers are a good thing, I believe – as long as Protégé is able to report the error, which will be the case once the TriX parser is out of the way.

For Protégé 5.6.0, I will forcefully ban the TriX parser directly in the code. Shouldn’t be a great loss since it’s already banned in higher OWL API branches.

@matentzn In the meantime, you can add the following to the Info.plist file:

<key>JVMOptions</key>
<array>
     <string>-Dorg.semanticweb.owlapi.model.parameters.ConfigurationOptions.BANNED_PARSERS=org.semanticweb.owlapi.rio.RioTrixParserFactory</string>
</array>
gouttegd commented 1 year ago

Should be fixed now in the 5.6.0 branch. I’ll rebuild the beta after the ontology summit session.