ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
259 stars 73 forks source link

OWLAPI 4.5.24 adds a weird axiom annotation to OBO serialisation #1089

Closed matentzn closed 1 year ago

matentzn commented 1 year ago

OWLAPI 1.5.24 adds a weird axiom annotation to OBO serialisation. This was not previously the case and only happens if the ontology is not legal OWL.

Minimal example:

<?xml version="1.0"?>
<rdf:RDF xmlns="http://purl.obolibrary.org/obo/fbbt/fbbt-simple.owl#"
     xml:base="http://purl.obolibrary.org/obo/fbbt/fbbt-simple.owl"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/fbbt/fbbt-simple.owl">
    </owl:Ontology>

    <owl:AnnotationProperty rdf:about="http://purl.obolibrary.org/obo/IAO_0000115">
        <obo:IAO_0000114 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000122"/>
        <rdfs:label>definition</rdfs:label>
    </owl:AnnotationProperty>

    <owl:AnnotationProperty rdf:about="http://www.geneontology.org/formats/oboInOwl#hasDbXref">
        <rdfs:label>database_cross_reference</rdfs:label>
    </owl:AnnotationProperty>

    <owl:AnnotationProperty rdf:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/FBbt_00000006">
        <obo:IAO_0000115>Any segment (FBbt:00000003) that is part of some head (FBbt:00000004).</obo:IAO_0000115>
        <oboInOwl:hasDbXref>UBERON:6000006</oboInOwl:hasDbXref>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/FBbt_00000006"/>
        <owl:annotatedProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000115"/>
        <owl:annotatedTarget>Any segment (FBbt:00000003) that is part of some head (FBbt:00000004).</owl:annotatedTarget>
        <oboInOwl:hasDbXref>FlyBase:FBrf0075072</oboInOwl:hasDbXref>
    </owl:Axiom>

</rdf:RDF>

Running this command:

robot convert -i test.owl -f obo -o test.obo 

Will result in this OBO file:

format-version: 1.2
ontology: fbbt/fbbt-simple

[Term]
id: FBbt:00000006
def: "Any segment (FBbt:00000003) that is part of some head (FBbt:00000004)." [FlyBase:FBrf0075072] {http://www.w3.org/1999/02/22-rdf-syntax-ns#type="owl:Axiom"}
xref: UBERON:6000006

While it should be:

format-version: 1.2
ontology: fbbt/fbbt-simple

[Term]
id: FBbt:00000006
def: "Any segment (FBbt:00000003) that is part of some head (FBbt:00000004)." [FlyBase:FBrf0075072]
xref: UBERON:6000006

The culprit is this axiom:

    <owl:AnnotationProperty rdf:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>

Which constitutes a violation of the OWL specification, but unfortunately happens rather often.

matentzn commented 1 year ago

I think there are three paths forward here:

  1. We declare that the current behaviour is the correct behaviour, and it should be fixed in the ontology. We add this to obo parser: if(property.isRDFType()) { logger.warn("use of built in vocabulary in annotation assertion"); } (pseudocode)
  2. We declare that the current behaviour is the wrong behaviour.
    • Option 1 "Hard fail". We add this to obo parser: if(property.isRDFType()) { throw IllegalSyntaxException(); } (pseudocode)
    • Option 2 "Ignore". We add this to obo parser: if(property.isRDFType()) { continue; } (pseudocode, essentially just ignoring cases where the value of the AP is a built-in property).
balhoff commented 1 year ago

Which constitutes a violation of the OWL specification, but unfortunately happens rather often.

Why does this happen often? To me it looks like an error in the source ontology which should be corrected.

matentzn commented 1 year ago

https://api.triplydb.com/s/dPBG6EPYW

The problem is that this never happened before, so the serialiser behaves differently than what it used to. Else we agree.

gouttegd commented 1 year ago

From an ongoing discussion on the OBO Slack:

As noted in the issue, those “weird annotation axioms” are caused by a faulty AnnotationProperty that should not exist.

Should we try to deal gracefully with the issue (that is, not emitting the “weird annotation axioms” in response to the faulty AnnotationProperty)?

There are several places where it could be done:

1) In the OWL API, just after reading an ontology (from any of the supported formats). We search the ontology we just read for the offending AnnotationProperty, and if we find it, we forcibly remove it from the ontology (such a property is illegal anyway, so its removal should not cause any issue… right?).

2) In the OWL API, in the OBO serializer. Somehow the faulty AnnotationProperty only causes problems when writing to OBO format, so we update the OBO serializer to ignore that property.

3) In ROBOT, just after reading an ontology. Similar solution as 1), just that it’s outside of the OWL API.

4) Outside of both OWL API and ROBOT. We use a SPARQL query to detect and remove the faulty AnnotationProperty. In the ODK, this could be done automatically by updating the rules that generate the release files in OBO format, by inserting a query --update command before the conversion to OBO.

I am personally slightly in favour of option 4. Removing or ignoring faulty OWL statements such as the one in cause here is a workaround that in my opinion has no place in the OWL API, and barely in ROBOT. Hardcoding such a workaround in the OWL API gives no incentives to ontology editors to fix their ontologies.

cmungall commented 1 year ago

I think behavior of the obo writer in 4.5.24 is consistent with other writers, and is consistent with what I think is a bug in the RDF-based parsers.

The behavior of the obo writer in 4.5.6 is not consistent with other writers.

To see, have a look at any non-RDF serialization of the example ontology above (in either 4.5.6 or in 4.5.24):

Functional:

AnnotationAssertion(Annotation(oboInOwl:hasDbXref "FlyBase:FBrf0075072") Annotation(rdf:type owl:Axiom) obo:IAO_0000115 obo:FBbt_00000006 "Any segment (FBbt:00000003) that is part of some head (FBbt:00000004).")

Manchester:

Class: <http://purl.obolibrary.org/obo/FBbt_00000006>

    Annotations:

            Annotations: <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "FlyBase:FBrf0075072",
                         rdf:type <http://www.w3.org/2002/07/owl#Axiom>
        <http://purl.obolibrary.org/obo/IAO_0000115> "Any segment (FBbt:00000003) that is part of some head (FBbt:00000004).",
        <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "UBERON:6000006"

This is consistent with the internal OWLAPI representation which can be seen in Protege

image

I think the correct thing to do here is for the OWLAPI to never interpret ?a rdf:type owl:Axiom as an annotation, even if rdf:type is an annotation property. However, I am not confident in my reading of the spec here (other than to just say behavior is undefined because it's an error). This is effectively the same as Damien's (1) but rather than remove post-hoc, it just doesn't put it there in the first place. Note if the OWLAPI does implement this it will cause a one-time diff to any ontology that uses ofn or omn as a serialization format where the erroneous annotation property is declared.

However, I think a perfectly practical thing to do here is to add in a filter at write-time (Nico's 1 or 2b). This maintains the current "repair magic" in the obo writer and maintains the current status of having different behavior from the ofn and omn writers.

Another option is to bless 4.5.24, and just live with the fact that some ontologies will have a one-time diff if they don't repair the error (which might be annoying if it's upstream). However, I am against this, because of another issue, and this one is squarely the fault of the obo writer.

If we take the obof output:

[Term]
id: FBbt:00000006
def: "Any segment (FBbt:00000003) that is part of some head (FBbt:00000004)." [FlyBase:FBrf0075072] {http://www.w3.org/1999/02/22-rdf-syntax-ns#type="owl:Axiom"}
xref: UBERON:6000006

and convert it back to OWL we get this monstrosity:

  <!-- http://www.geneontology.org/formats/oboInOwl#http://www.w3.org/1999/02/22-rdf-syntax-ns#type -->

    <owl:AnnotationProperty rdf:about="http://www.geneontology.org/formats/oboInOwl#http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>

And note this is not parseable by Rust RDF parsers, and riot --validate will give a WARN.

This is a known issue that doesn't normally surface as the convention is to use oio for the axiom annotation, and I think fixing this is outside the scope of the next release.

So I think that preserving the status quo and implementing Nico's filter is the way to go, the same as Damien's (2).

We should still pursue Damien's (4) alongside a battery of other tests we should be running, but this is not so urgent if we can just maintain the status quo.

ignazio1977 commented 1 year ago

I'm curious as to the origin of that annotation property declaration - I wouldn't think someone is trying that on purpose, wondering if it's a bug - and, if so, if it's in the OWL API.

Re how to treat it, it's an invalid axiom and as such should cause an error to be thrown in strict parsing mode; in relaxed parsing mode, we've tended to not try anything on such invalid axioms unless they get in the way of rendering data properly - which is the case, here. So I'd go for ignoring this axiom during rendering, at the very least. But ignoring it during parsing requires less code (changes on one parser rather than changes in all renderers).

(I haven't tried parsing this example in strict mode - if anyone else has, please let me know, saves me a job)

matentzn commented 1 year ago

@gouttegd - after reading @cmungall comment I have a mild inclination to this practical path:

  1. The OWL API does not seem to complain (robot validate-profile) about the axiom as "redefining built-in property". I think this is an omission on the implementation of the spec (but maybe @ignazio1977 can weigh in and correct me if I am wrong). The spec says "IRIs from the reserved vocabulary other than the ones listed above must not be used to identify annotation properties in an OWL 2 DL ontology." (and rdf:type is not "listed above).
  2. OBO format does do some repair anyways (in that it drops certain axioms during the conversion process and shoves it into the owl-axioms header). Instead of "fixing" the issue at OWL API level, we can simply declare the axioms as "not supported by OBO format" and shove them into the owl-axioms header.

EDIT: moved list of action items to comment later in the thread.

I would prefer right now not to do a repair at OWL API level (dropping axioms) - there may be people that want to use illegal axiom combinations for testing things. ROBOT should only repair when you ask it to do so.

matentzn commented 1 year ago

I'm curious as to the origin of that annotation property declaration - I wouldn't think someone is trying that on purpose, wondering if it's a bug - and, if so, if it's in the OWL API.

This could be anything, and is quite prevalent: https://api.triplydb.com/s/dPBG6EPYW

Best guess - some RDF outside of OWL was imported at some point into an OWL ontology and that's when it festered.

Re how to treat it, it's an invalid axiom and as such should cause an error to be thrown in strict parsing mode; in relaxed parsing mode, we've tended to not try anything on such invalid axioms unless they get in the way of rendering data properly - which is the case, here. So I'd go for ignoring this axiom during rendering, at the very least. But ignoring it during parsing requires less code (changes on one parser rather than changes in all renderers).

Strict parsing may be the right option, but I think its too draconian. We cant change ROBOT to strict parsing without breaking too many things unfortunately.. I opt for fixing the renderers, which should have a standard mechanism to handle "unsupported axioms"..

Thank you @ignazio1977 for your comments!

gouttegd commented 1 year ago

OWLAPI: Squeeze the Declaration(AP(rdf:type)) axiom into the owl-axiom header at write-time

You mean when writing an OBO file? I don’t think that will do any good. When writing to OBO, the problem is not the AnnotationProperty declaration itself (which is already not translated into the OBO file, not even in the owl-axiom tag), it’s the fact that this axiom being present at all in the ontology leads to those {http://www.w3.org/1999/02/22-rdf-syntax-ns#type="owl:Axiom"} qualifiers. It’s those qualifiers that needs to be filtered out when writing (if we have not filtered out the AnnotationProperty declaration when reading, or have not removed it at any point before attempting to convert to OBO).

matentzn commented 1 year ago

Ok, after consulting one last time with @gouttegd and @hkir-dev offline, this is, I hope, what we agree on doing:

For now, no repairs anywhere. This is just recovering a state that was working as expected, albeit with a tiny hack on the OBO Format parser.

hkir-dev commented 1 year ago

OWLAPI: In the addQualifiers method of OWLAPIOwl2Obo, we can insert something like this (proof-of-concept only, would obviously need more testing):

Two alternatives are implemented at:

matentzn commented 1 year ago

Excellent @hkir-dev - I will take it from here! THANKS!

cmungall commented 1 year ago

I favor https://github.com/owlcs/owlapi/pull/1092 but you can proceed with either solution as you see best

(I edited 1092 to add a comment - I am a big fan of comments pointing back to issues and description of context - if you go with the other PR add a similar comment)

matentzn commented 1 year ago

Seems https://github.com/owlcs/owlapi/pull/1092 was not enough.. Sorry @cmungall

I almost merged this into the new ENVO release:

image

Was there anything deeply wrong with https://github.com/owlcs/owlapi/pull/1093?

matentzn commented 1 year ago

Made a call now. There is a tiny risk involved, but we will opt for:

https://github.com/owlcs/owlapi/pull/1093