ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
261 stars 74 forks source link

simply saving an OWL/XML (*.owl) ontology with ROBOT 1.9.2 is insufficient to canonicalize it #1090

Open jclerman opened 1 year ago

jclerman commented 1 year ago

The recommendation in the release notes for robot 1.9.2 suggests to:

save your ontology with ROBOT 1.9.2 or Protégé 5.6.0 without introducing any changes to the logic or annotations, and commit the resulting ontology files

In my experience, that wasn't quite enough - complete canonicalization of my ontology didn't happen without round-tripping through OWL functional format - without doing that, some lines in the XML output were re-ordered when I round-tripped.

What worked for me (other variants might work too; haven't tested):

robot convert -i my-protege-5.5.0-ontology.owl -o my-ontology.ofn
robot convert -i my-ontology.ofn -o my-canonicalized-ontology.owl
matentzn commented 1 year ago

Surprising! What happened when you tried without the ofn intermediary?

jclerman commented 1 year ago

Hi @matentzn. When I just did:

robot convert -i my-original-ontology.owl -o my-attempted-canonicalized-ontology.owl

I found that annotation-values were not sorted in the output. After round-tripping through ofn, I got a stable result (including sorting of those values).

Here's a fragment of a diff of the my-attempted-canonicalized-ontology.owl against what I get after round-tripping:

*** 104805,104816 ****
      </owl:Axiom>
      <owl:Axiom>
          <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010703"/>
          <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasNarrowSynonym"/>
          <owl:annotatedTarget>wing zeugopod skeleton</owl:annotatedTarget>
-         <oboInOwl:hasDbXref>OBOL:automatic</oboInOwl:hasDbXref>
          <oboInOwl:hasDbXref>NCBITaxon:8782</oboInOwl:hasDbXref>
          <oboInOwl:hasSynonymType rdf:resource="http://purl.obolibrary.org/obo/uberon/core#SENSU"/>
      </owl:Axiom>
      <owl:Axiom>
          <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010703"/>
          <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym"/>
--- 104805,104816 ----
      </owl:Axiom>
      <owl:Axiom>
          <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010703"/>
          <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasNarrowSynonym"/>
          <owl:annotatedTarget>wing zeugopod skeleton</owl:annotatedTarget>
          <oboInOwl:hasDbXref>NCBITaxon:8782</oboInOwl:hasDbXref>
+         <oboInOwl:hasDbXref>OBOL:automatic</oboInOwl:hasDbXref>
          <oboInOwl:hasSynonymType rdf:resource="http://purl.obolibrary.org/obo/uberon/core#SENSU"/>
      </owl:Axiom>
      <owl:Axiom>
          <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010703"/>
          <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym"/>
matentzn commented 1 year ago

Very important to know for us, thank you for taking the time to report this.. Apart from us knowing about this, is there anything you think that should be done here in terms of a fix? It seems we basically have to live with this (short of someone working on the OWX parser in the OWL API itself)

jclerman commented 1 year ago

Doesn't seem like there is too much that ROBOT could do (I can imagine internal workarounds, like setting up the ROBOT code to internally do an ofn round-trip when being asked to do a no-op conversion from/to the same format - but not sure that's a good idea).

My only real suggestion would be to perhaps update the 1.9.2. release-notes, to tell people that they might need to do an ofn round-trip to achieve canonicalization - that'd help users avoid getting bitten by this issue.

jamesaoverton commented 1 year ago

Thanks @jclerman! I added a mention of this issue to the 1.9.2 release notes. Once #1088 and #1089 are resolved and everything is updated, we'll make a bigger push to get everyone to update, and we'll keep this in mind.

CarMoreno commented 1 year ago

I am experimenting with this behaviour by using the robot template command to generate a .owl file from a .tsv file. Unfortunately, the workaround using .ofn does not work. :(

jamesaoverton commented 1 year ago

@CarMoreno What doesn't work? I think the suggestion in this thread it to use robot template to create an .ofn file, then then robot convert to .owl (RDF/XML).

CarMoreno commented 1 year ago

@jamesaoverton That's exactly what I am doing. I generated thedummy.ofn file from the template. And then, I generate the dummy.owl using dummy.ofn created previously:

robot template --template dummy_template.csv --output dummy.ofn robot convert --input dummy.ofn --output dummy.owl

The axioms keep unsorted.

allenbaron commented 1 year ago

I have been exploring this somewhat with ROBOT 1.9.4 and Protege 5.6.2 (starting files were built with ROBOT 1.8.3 and Protege 5.5.0). Based on my exploration, for any file to reach a stable serialization two convert operations are needed but it doesn't matter what the file is converted to, e.g. to make doid.owl stable, either of the following work and end up with the same result.

robot convert -i doid.owl -o doid1.owl
robot convert -i doid1.owl -o doid.owl

OR

robot convert -i doid.owl -o doid.ofn
robot convert -i doid.ofn -o doid.owl

Stabilizing the doid-edit.owl file, which is actually in OWL functional syntax, also requires two filetype-agnostic converts. Protege has similar behavior. The first edit and save results in sorting by language tag, then alphabetical (same as first convert) and the second edit, if made after closing and re-opening the file, gets the final sort ordering.

For some reason the first convert operation sorts by presence/absence of language tag before sorting strings alphabetically, while the second sorts alphabetically first and language tag second.

Comparison of doid-edit.owl (ROBOT 1.8.3/Protege 5.5.0) with doid-edit.owl after one convert with ROBOT 1.9.4

image

Comparison of first convert of doid-edit.owl with second, both ROBOT 1.9.4

image

ROBOT template tests

I have only tested using ROBOT template to add axioms to an existing file, e.g. robot template -i doid-edit.ofn --template template.tsv --merge-before -o doid-edit.ofn and to me it appears that the added axioms are sorted correctly. As long as the file serialization is already stable nothing needs to be done; if it hasn't, one ROBOT convert is needed.

matentzn commented 1 year ago

@allenbaron super useful analysis, thank you!

allenbaron commented 1 year ago

Just noting that after stabilizing serialization of an .ofn file, if I run a robot query --update command the ordering of lines in the output file (from .ofn to .ofn, in my case) changes making it similar to running a single robot convert as I described above. I have to run another non-chained robot convert to get the ordering back.

Full command to maintain stable ordering (and prefixes):

robot --add-prefixes build/doid-edit_prefixes.json \
    query -i src/ontology/doid-edit.owl \
    --update ../../DO_dev/sparql/update/DO-def_format_gene.ru \
    -o tmp.ofn && \
robot convert -i tmp.ofn -o tmp2.ofn && \
mv tmp2.ofn src/ontology/doid-edit.owl && \
rm tmp.ofn

Serialization is stable when I run chained reason & annotate going from .ofn to .owl (i.e. another robot convert on the resulting .owl file has no effect).