owlcs / owlapi

OWL API main repository
823 stars 315 forks source link

Need to add a new API to create an ontology manager from an IRI- not sure where it belongs. #423

Open sesuncedu opened 9 years ago

sesuncedu commented 9 years ago

I want to add an API for creating an owl ontology manager, possibly containing multiple OWLOntology objects, from a URL (possibly with configuration options).

It may need an SPI, as there may be several supported formats-

The latter approach is backwards compatible, and has the potential to be a big win for small ontology documents with lots of shared declarations (e.g. documents generated by automatic decomposition). Zip does really poorly in this situation.

ansell commented 9 years ago

An example of the manifest layout I am using in my application for creating various wrapped OWLOntologyManager objects, based on the ontologies and versions that are imported from a particular user document is:

https://github.com/podd/podd-examples/blob/master/webapp-example/src/main/resources/schema-manifest.ttl

Much of that could be auto-generated (except the physical location and omv:currentVersion), but I have found it simpler to manage if all of the imports and version IRIs are in one file.

sesuncedu commented 9 years ago

That's a good example! (The difference between this approach and the merged ontology document is that the latter is somewhat protected from "mostly harmless" changes to imports (cough SKOS cough).

Also, if the ontology manager is going to stick stuff in the format objects, there really ought to be a place for document metadata like physical URL, last modification times, variants, etc.

ansell commented 9 years ago

I also validate that the imports in the manifest file exactly match the imports in the actual schema document, as I ran into a case like you describe in the past.

I agree that more metadata needs to be included in the format objects, where it is available.

sesuncedu commented 9 years ago

When you say that they exactly match, What exactly do you mean? Cos there's too many meanings for OWL, And all of them are wrong. [As they say in Manchester]

The easiest approach is to require that the ontology documents be bitwise identical (or at least have identical strong cryptographic checksums). This is awkward for imports in the OWLAPI because it is not easy to get hold of the raw byte stream (even ignoring BOMs).

A general approach might be to register metadata extractors with the OWLOntologyManager (or factory thereof). Some of these are url scheme dependent ; others are format based. The current code makes things awkward because initial input can come from streams / readers, rather than a URL, but that can be worked around.

Also:

If the ids for anonymous individuals are always preserved, then canonicalization of abstract / structural OWL is not too bad. I seem to remember that this requires a few changes in owlapi;, storing anonymous individuals be tuples of ontology ID and bnode name might be enough (and would be a good excuse for not smooshing them into IRI.

There is no standard structural ordering (just unhelpful structural equivalence). The sorted output I've been adding uses the existing owlapi sort, so that can be the standard (this ought to be documented).

ignazio1977 commented 9 years ago

Blank node ids cannot always be preserved, because of potential overlap between separate ontologies - or the same ontology imported multiple times, if the magic ontology tag is missing, or different versions of the same ontology. I blame RDF. Or XML. Or both.

sesuncedu commented 9 years ago

Right- that's why I referred to Anonymous Individuals, which would be tuples of ontology document id and bnode id.

I am not sure what it means to import an ontology that has no ontology id (if nothing else, the IRI in the import statement is, by default, the ontology IRI (if the located document has an ontologyIRI and a versionIRI, then that is the real Ontology Document ID .

I don't have the spec in front of me, so I'm not sure what the rules for multiple imports are. Incomplete and wrong is my guess.

ignazio1977 commented 9 years ago

Not ontology iri, ontology tag - i.e., there is no triple asserting that the file is an ontology. The file would contain generic rdf. When such a thing is imported through owl:imports, the spec say to include the triples in the importing ontology - it's the only case in which an import adds triples to an ontology, and yes it's vulnerable to cyclic imports when the imported URL is, for example, redirected. When there are blank nodes in the triple blob, the original ontology can expand proportionally to the number of redirected URLs that include the blob. Bad on many levels.

sesuncedu commented 9 years ago

Ah - the OWL 1 "inclusion" clause in the rdf mapping document.

For backwards compatibility with OWL 1 DL, if G contains an owl:imports triple pointing to an RDF document encoding an RDF graph G' where G' does not have an ontology header, this owl:imports triple is interpreted as an include rather than an import — that is, the triples of G' are included into G and are not parsed into a separate ontology.

Of course, in the structural spec:

 OWL 2 tools may implement a redirection mechanism: when a tool is used to access an ontology document at IRI I, the tool may redirect I to a different IRI DI and access the ontology document via DI instead. The result of accessing the ontology document via DI must be the same as if the ontology were accessed via I.

So it is an error if redirection would give a different result to a non-redirect...

In http://www.w3.org/TR/2012/REC-owl2-syntax-20121211/#Anonymous_Individuals

Special treatment is required in case anonymous individuals with the same node ID occur in two different ontologies. In particular, these two individuals are structurally equivalent (because they have the same node ID); however, they are not treated as identical in the semantics of OWL 2 (because anonymous individuals are local to an ontology they are used in). The latter is achieved by standardizing anonymous individuals apart when constructing the axiom closure of an ontology O: if anonymous individuals with the same node ID occur in two different ontologies in the import closure of O, then one of these individuals must be replaced in the axiom closure of O with a fresh anonymous individual (i.e., an anonymous individual whose node ID is unique in the import closure of O).

So... the scope of anonymous individuals is an ontology (which either has an ID, or is the root, anonymous ontology). Thus it is an error for an rdf included document to use the same node ID for an anonymous individual as is used in the including rdf document, since rdf graph merger requires that such blank nodes be standardized apart, but OWL 2 requires that anonymous individuals have ontology scope.

Quod liberty!

@pfps?

sesuncedu commented 9 years ago

If an OWL 2 ontology with no name is used, then there is potential for confusion if multiply imported, since each ontology is a blank ontology.

sesuncedu commented 9 years ago

Since OWL 2 ontologies have the type triple use an rdf blank node.
Since by http://www.w3.org/TR/rdf11-concepts/#section-skolemization we can always replace a blank node with an IRI, and blank nodes are scoped local to a file, we can generate a well known skolem constant as the ontologyIRI for the anonymous ontology that is uniquely derived from the import IRI.

ansell commented 9 years ago

I have not come across any owl:imports statements that include blank nodes so far.

When I say exactly matching I mean that all of the RDF XYZ IRIs that match "<ontologyIRI> owl:imports <XYZ>" must be present in both the manifest and in the actual ontology. That is just a safety check I put in to make sure that I can use the manifest on its own to compute the total closure at the ontology level (without actually parsing the ontologies themselves). If/when I do parse the ontologies and they don't match the manifest it errors out, so even though I rely on the manifest, it will fail when the actual ontologies are parsed.

Also, there are many tests to ensure that the manifest matches the actual ontologies before the package is released :) I am also not relying on external ontologies. I want to control the full set locally to version them in a sane way that doesn't rely on everyone else doing version control properly. The Semantic Web is great except for if you rely on the "InterWeb" in realtime to always work for your application to function.