owlcs / owlapi

OWL API main repository
828 stars 315 forks source link

Improve serialization #702

Closed NicolasRouquette closed 7 years ago

NicolasRouquette commented 7 years ago

In our project (https://github.com/JPL-IMCE/gov.nasa.jpl.imce.ontologies.public), we've experienced problems similar to https://douroucouli.wordpress.com/2014/03/30/the-perils-of-managing-owl-in-a-version-control-system/. I was glad to hear about #273. However, with versions 4.2.8 and 5.0.5, there are still problems with serialization variability with annotations and expressions.

I took a look and noticed that org.semanticweb.owlapi.util.OWLAPIStreamUtils#equalStreams expects the streams to be sorted; however, this is not generally true.

In several places, there are calls to various sorting methods defined in org.semanticweb.owlapi.util.CollectionFactory that effectively mask exceptions in case the ordering is unstable. Why would the order be unstable?

This led me to look at OWLObject which inherits Comparable<OWLObject>. Why would the order be unstable?

The answer turned out to be conceptually simple: uk.ac.manchester.cs.owl.owlapi.OWLObjectImpl#compareTo.

For the ordering to be stable, then org.semanticweb.owlapi.model.HasComponents#components must produce an ordered stream.

The simplest way to guarantee this is to ensure that everything is sorted by construction. For OWLObjects corresponding to the OWL2 spec, this means sorting everything at construction. For SWRLObject, the order must be preserved at construction.

A consequence of this is that all the sorting methods on org.semanticweb.owlapi.util.CollectionFactory can be finally deleted because the ordering will be stable, by construction.

Of course, the trick is in the details of doing all this...

ignazio1977 commented 7 years ago

This led me to look at OWLObject which inherits Comparable. Why would the order be unstable?

Because of bugs in the compareTo implementations in various classes - that's how we found out that there was instability, when we started using Java 8. An unstable sorting in Java 7 did not generate exceptions, while in Java 8 the new algorithm explicitly handles the instability with an exception.

We got rid of the bugs we found, but if we had let the exception bubble up, users would have found themselves unable to save an ontology because of issues outside their control - and, in this case, fairly minor ones. Imagine an hour spent modeling in Protege, and it refusing to save all that work because of a problem sorting annotations - very bad user experience. The tradeoff there is that the output file will not be perfectly sorted; not ideal, but as long as any sorting bug is taken care of, the impact over time should be less and less.

Speaking of which, there has been a bug fix about ontology annotation sorting after 5.0.5 (released in 5.1.0). Can you explain the issues you've found in file order?

ignazio1977 commented 7 years ago

The simplest way to guarantee this is to ensure that everything is sorted by construction.

Indeed, that is the option we took. All OWLObjectImpl implementations sort their collections (annotations, equivalent sets, and so on) during construction. Under the assumption that there are no more compareTo bugs, this should ensure that all streams from these collections are sorted, and all streams that concatenate other streams, like components(), are in reliable order - not necessarily the same order of compareTo() because the concatenation mixes elements from different collections.

ignazio1977 commented 7 years ago

Pull request included in version5, version4 and master for version 6.

ddooley commented 5 years ago

Just a note that if I'm not mistaken, ordering still appears to be somewhat indeterminate with Protege 5.5 file save, which I see is using OwlAPI 4.5.8, and presume where issue is. It would be a great to resolve this for Github efficiency. Example with bold/italics showing delete/insert:

**dead body** Gregory Harhay _dead body_ cadaver corpse
ignazio1977 commented 5 years ago

@ddooley Is this part of a publicly available ontology?

ddooley commented 5 years ago

Yes, this is from https://github.com/FoodOntology/foodon/, which takes in terms from other ontologies, and adds synonymy etc. It is in foodon-edit.owl , line 162058 of https://github.com/FoodOntology/foodon/commit/b5de9749996553ed536c79059485938e6431cc71#diff-3ba569ce430a37139d3278d6a229f662 . This file is only adjusted by protege.

ddooley commented 5 years ago

Looks like it might only be happening in ...

**Damion Dooley** Wine vinegar is made from red or white wine, and is the most commonly used vinegar in Southern and Central Europe, Cyprus and Israel. _Damion Dooley_ wikipedia:vinegar