protegeproject / protege

Protege Desktop
http://protege.stanford.edu
Other
970 stars 229 forks source link

Problem with serialization in Protégé 5.6: offsets in "genid" #1164

Open jpi-seb opened 10 months ago

jpi-seb commented 10 months ago

We have just migrated from Protégé 5.5 to Protégé 5.6, and we detected a tricky bug in the serialization.

In our ontology, we have classes with rdfs:subClassOf axioms, which are annotated with rdfs:comment using OWL 2 axiom annotations.

With Protégé 5.5, no identifier was generated in this case, but now with Protégé 5.6 an identifier with the "genid" prefix is used in the serialization. See this example with the generated "genid1":

<owl:Class rdf:about="http://tst#ClassA">
    <rdfs:subClassOf rdf:nodeID="genid1"/>
</owl:Class>
<owl:Restriction rdf:nodeID="genid1">
    <owl:onProperty rdf:resource="http://tst#Prop"/>
    <owl:someValuesFrom rdf:resource="http://tst#ClassAB"/>
</owl:Restriction>
<owl:Axiom>
    <owl:annotatedSource rdf:resource="http://tst#ClassA"/>
    <owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
    <owl:annotatedTarget rdf:nodeID="genid1"/>
    <rdfs:comment xml:lang="fr">Test</rdfs:comment>
</owl:Axiom>

The problem we have is that when we add a new OWL 2 axiom annotations in our ontology, all the axiom annotations that follow the new one (in the order of the serialized file) are changed with a new "genid" (an offset in introduced for each genid).

See this example: result of the git diff command after introduction of a new axiom annotation (the previously generated genid1 is replaced by genid3):

@@ -53,16 +65,16 @@
     <!-- http://tst#ClassB -->

     <owl:Class rdf:about="http://tst#ClassB">
-        <rdfs:subClassOf rdf:nodeID="genid1"/>
+        <rdfs:subClassOf rdf:nodeID="genid3"/>
     </owl:Class>
-    <owl:Restriction rdf:nodeID="genid1">
+    <owl:Restriction rdf:nodeID="genid3">
         <owl:onProperty rdf:resource="http://tst#Prop"/>
         <owl:someValuesFrom rdf:resource="http://tst#ClassBB"/>
     </owl:Restriction>
     <owl:Axiom>
         <owl:annotatedSource rdf:resource="http://tst#ClassB"/>
         <owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
-        <owl:annotatedTarget rdf:nodeID="genid1"/>
+        <owl:annotatedTarget rdf:nodeID="genid3"/>
         <rdfs:comment xml:lang="en">Test</rdfs:comment>
     </owl:Axiom>

Our ontology is full of these annotations (hundreds of them), so at each new annotation introduced, git is polluted with a lot of these non-pertinent changes. Commits diff in git history become unreadeable, and merges between branches are now a lot more complex.

Is there a way to prevent this behaviour with Protégé 5.6 ?

You will find a full example in the ZIP file below : tst0.rdf is an small example ontology with OWL 2 axiom annotations, and tst1.rdf is the same ontology with a new annotation added. Compare both files to see offsets in "genid". tst.zip

Thank you for your help !

gouttegd commented 10 months ago

This seems to be due to a change in the behaviour of the OWL API between 4.5.6 (used in Protégé 5.5) and 4.5.25 (used in Protégé 5.6).

I’ll have a closer look later, but my understanding (from OWL API tickets such as this one: https://github.com/owlcs/owlapi/issues/881) is that the behavourial change was intended and that the former behaviour was actually incorrect. I’ll check with the OWL API folks, but in the meantime I don’t think there is any possible workaround in Protégé itself.