Closed ignazio1977 closed 8 years ago
Given that OWLAPI Turtle and RDF/XML files are rendered based on categories (classes/individuals/etc.), the counter in that example may need to be localised to the category.
From a technical point of view, it shouldn't be difficult to implement a counter, as in practice all changes go through the OWLOntologyManager so there would just need to be an AtomicLong for each category, basically.
Not totally sure I understand the numbering strategy. Would the ordering be lost if it were roundtripped via a non-number preserving format?
My naive thoughts were that it would be possible to define a sort order on any set of constructs. For example:
Yes, defining an ordering is good but it does not allow to preserve the existing structure, e.g., if an ontology file with the "wrong" ordering is read, the output will not play well with the previous version. It will work well with successive versions, but there would need to be an 'update' step. It's basically the same problem you mention about the ordering being lost roundtripping with another tool. I'm not sure there's a catchall solution here.
I guess I'm OK with that. But my bias is primarily to the VCS use case.
I can see how your proposal would be nice if people were hand-editing the files and there was some axiom ordering that was appealing to them & they wished to preserve it. But I think anyone hand-editing rdf/xml long term would be certifiable (we've all done it short term...)
Would these be implemented as separate comparators?
Perhaps ontologies, axioms, class expressions etc. and the objects that they contain should preserve the order that they are supplied with. Sorting on rendering, or whenever required, could just use the appropriate comparator.
I would actually like to have a well defined sort order for things like creating a digest of a set of axioms (unless there is a better way of doing this).
Sorting seems to give a big improvement in compression ratios (at least for FSS).
Since each axiom is true and effectively ANDed together, and since AND is idempotent, there should not be any "preferred" order for axioms. First Order Logic requires that they all be treated as if they have no particular order, so a topographic sort should work fine.
David Whitten 713-870-3834
On Fri, Aug 15, 2014 at 8:20 PM, Simon Spero notifications@github.com wrote:
Sorting seems to give a big improvement in compression ratios (at least for FSS).
— Reply to this email directly or view it on GitHub https://github.com/owlcs/owlapi/issues/273#issuecomment-52375650.
Of course the semantics of the ontologies is unaffected by the order of axioms.
The point of this change is purely to minimise changes to the text output, for the greater good of text based version control systems and other non OWL aware tooling.
Simon sorted some of the syntaxes, save for manchester (and the legacy ones, e.g., krss).
@sesuncedu which version of the owlapi are these fixes in? Useful to know for ensuring everyone's Protege is in sync
@ignazio1977 do you know?
Should be in all versions. I'll double check.
What does all versions mean? I'm trying to figure out which versions of protege support this, and whether we need a new protege build
All most recent versions: 3.5.2, 4.0.2 and version 5 master. It's included in the 4.1.0 release candidate as well.
From past experience, Protege 4.3 and 5 can be adapted to use 3.5.2 by dropping the 3.5.2 osgidistribution jar in the protege plugins folder.
I am using Protege 5beta18 snapshot, it saves with this version of the owlapi:
<!-- Generated by the OWL API (version 3.5.3.20150903-2211) http://owlapi.sourceforge.net -->
Yet we still get spurious diffs, e.g. https://github.com/oborel/obo-relations/commit/f9e17bf16fedaf316c9ebafe1e7f3d0ef5873ce7
This is in RDF/XML. I'm going to re-open as my understanding was that the intent was to implement deterministic ordering for non-legacy syntaxes (unless rdf/xml is considered legacy...)
Feel free to re-close but let me know where this is fully implemented
I seem to have missed a commit on 3.5.2 when I checked. My bad.
OK, was that just for rdf/xml or does it affect all?
Not sure yet, looks like Turtle and RDF/XML
I just finished slouching in to Bethlehem so not really brain-enabled, but I think the relevant code is in one of the base rdf renderers. (I know that in version 4 it changed blank node ids for the rio writers (since I had to adjust test cases)
On Thu, Oct 8, 2015 at 6:57 PM, Ignazio Palmisano notifications@github.com wrote:
Not sure yet, looks like Turtle and RDF/XML
— Reply to this email directly or view it on GitHub https://github.com/owlcs/owlapi/issues/273#issuecomment-146710848.
@cmungall I've fixed the issue, but one problem you'll see for that ontology is that the next save will still introduce random changes - the previous versions were not sorted. After that things should normalize.
I'll put a Protege build with the updated jar up for evaluation once I'm done.
@sesuncedu one thing I'm not clear about is the change to RDFXMLRenderer
private void writeCommentForEntity(String msg, OWLEntity entity) {
checkNotNull(entity, msg);
String iriString = entity.getIRI().toString();
String labelString = labelMaker.getShortForm(entity);
String commentString = null;
if (!iriString.equals(labelString)) {
commentString = labelString;
} else {
commentString = iriString;
}
writer.writeComment(XMLUtils.escapeXML(commentString));
}
If I interpret the results correctly, this will change the banner in XML files to use the (one of the) labels for the entity being written out. That sounds like a great idea to me, but it will also introduce a number of changes to existing ontologies. Was the intention to make this configurable?
Now fixed in the version3 branch, I've used the ontology linked by @cmungall to verify and cherry picked the manchester syntax sorting as well. The sorting test is now the same for version 3 and 4.
I've not enabled @sesuncedu's change to use a label in RDF/XML banner for entities, as this would introduce more changes in the output. I'm planning to add it and make it switchable.
I couldn't quite tell from the thread, so could someone summarize the sort order employed now in OWLAPI/Protege >= 5.0.0-beta-18? Is it deterministic down to the triplet? Does it parallel being able to sort an XML document by tag name, and then by attribute name and value, then content, or some-such? It sounds like OWLAPI sorts before it writes out to various formats, which sounds great.
In other words, now we do have diff'able ontology output via OWLAPI and Protege, with no caveats?
I appreciate all the work done on this!
It should now be deterministic.
Hedge case:
It is still possible for small ontology level changes to generate disproportionately large textual changes. I believe that this should only occur if the output format explicitly renders all blank nodes (e.g. N-triples), and the ontology level change alters the number of blank nodes required to render some axioms (and which have other axioms rendered after them).
There's not much that can be done about this, as these blank nodes don't exist at the OWL level. Fortunately these blank nodes are not explicitly rendered in most formats.
Some metrics using GO are in my 2015 owled paper - http://cgi.csc.liv.ac.uk/~valli/OWLED2015/OWLED_2015_paper_12.pdf
On Wed, Jan 18, 2017, 1:59 PM Damion Dooley notifications@github.com wrote:
I couldn't quite tell from the thread, so could someone summarize the sort order employed now in OWLAPI/Protege >= 5.0.0-beta-18? Is it deterministic down to the triplet? Does it parallel being able to sort an XML document by tag name, and then by attribute name and value, then content, or some-such? It sounds like OWLAPI sorts before it writes out to various formats, which sounds great.
In other words, now we do have diff'able ontology output via OWLAPI and Protege, with no caveats?
I appreciate all the work done on this!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/owlcs/owlapi/issues/273#issuecomment-273567284, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZIGxBtUs4e2s3f6-aAkW9n1tKJK3GAks5rTmDzgaJpZM4CXO96 .
To the extent that it can be tested, it is deterministic and tested to stay so. As @sesuncedu said, this is not an absolute absolute, due to a few things. However, blank node ids are generated in sequence when parsing and are used in sorting blank nodes, so corner cases should be fairly uncommon.
Node identity comes after a number of other factors; ordering is implemented as follows:
Sequences of axioms or any other OWL objects are sorted by type first, then by values of contained properties/expressions, down to IRI (alphabetical) when necessary. Most of the time this is enough to have stable order.
Everything has been working perfectly for me for the last year or so.
Great, thanks for this feedback.
Having chanced upon http://douroucouli.wordpress.com/2014/03/30/the-perils-of-managing-owl-in-a-version-control-system/ I wonder if the OWLAPI should alleviate the pain of diffs on Turtle/XML syntaxes.
The simplest solution I can think of is a counter on OWLObject that keeps track of the order in which the objects were created. Then, when sorting axioms, class expressions and what have you for output, use it together with the current criteria.
Example:
Ontology contains three equivalent axioms, one class assertion
During parsing, the axioms are numbered 1, 2, 3, 4
Add new equivalent axiom, numbered 5
Output order is 1, 2, 3, 5, 4
i.e., the new equivalent axiom is the last of the equivalent axioms list.
What are your thoughts? @matthewhorridge @cmungall anyone else?