mmisw / mmiorr

Unmaintained old MMI ORR system (v2) -- New development at https://github.com/mmisw/orr
2 stars 1 forks source link

how to convert hash-path remote ontology to slash-path? #345

Closed graybeal closed 8 years ago

graybeal commented 9 years ago

The following is somewhat experimental, but I thought it should have worked, at least in a limited way. Not sure exactly why it doesn't.

I wanted to upload http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl, but substitute a / separator for # separator. So I replaced the header in a copy of the file, as shown below. The ontology parsed and loaded at https://mmisw.org/orr/#http://www.isi.edu/ikcap/geosoft/ontology/csdms, but none of the terms I tested resolved at the expected path (e.g., https://mmisw.org/orr/#http://www.isi.edu/ikcap/geosoft/ontology/csdms/air__dielectric_constant). And of course they wouldn't resolve at the remote URL, because that resolver uses # separators.

Also, about 800 terms of the original 4892 Individuals did not make it into the list of my 4087 Individuals in the converted ontology. (They did successfully import in the unmodified ontology.) These turned out to be terms whose class type was defined as a subclass of the Assumption class, which is defined in the imported ts ontology:

    <owl:Class rdf:ID="CFConventionStandardName">
        <rdfs:label>CF Convention Standard Name Assumptions</rdfs:label>
        <rdfs:subClassOf rdf:resource="&ts;Assumption" />
    </owl:Class>

Since the ts entity is still defined as it was originally, I'm not sure why these terms would have been omitted.

What was it about my modifications that meant they were not specifying valid ontological terms?

Original ontology header:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
    <!ENTITY owl "http://www.w3.org/2002/07/owl#" >
    <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" >
    <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" >
    <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
    <!ENTITY ts "http://www.isi.edu/ikcap/geosoft/ontology/software.owl#" >
]>

<rdf:RDF xmlns="http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl#"
    xml:base="http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:ts="http://www.isi.edu/ikcap/geosoft/ontology/software.owl#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

    <owl:Ontology rdf:about="http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl">
        <owl:imports rdf:resource="http://www.isi.edu/ikcap/geosoft/ontology/software.owl"/>
    </owl:Ontology>

My new header (many of the elements were obtained by downloading a copy of the first ontology from MMI):

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
    <!ENTITY owl "http://www.w3.org/2002/07/owl#" >
    <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" >
    <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" >
    <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
    <!ENTITY ts "http://www.isi.edu/ikcap/geosoft/ontology/software.owl#" >
]>

<rdf:RDF xmlns="http://www.isi.edu/ikcap/geosoft/ontology/csdms/"
    xml:base="http://www.isi.edu/ikcap/geosoft/ontology/csdms/"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:ts="http://www.isi.edu/ikcap/geosoft/ontology/software.owl#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:omv="http://omv.ontoware.org/2005/05/ontology#"
    xmlns:omvmmi="http://mmisw.org/ont/mmi/20081020/ontologyMetadata/">
    <owl:Ontology rdf:about="http://www.isi.edu/ikcap/geosoft/ontology/csdms">
        <owl:imports rdf:resource="http://www.isi.edu/ikcap/geosoft/ontology/software.owl"/>
        <omv:hasCreator>? -- need this info</omv:hasCreator>
        <omvmmi:creditRequired>conditional</omvmmi:creditRequired>
        <omv:hasContributor>Scott D. Peckham</omv:hasContributor>
        <omvmmi:contact>Scott D. Peckham</omvmmi:contact>
        <omvmmi:hasContentCreator>Scott D. Peckham</omvmmi:hasContentCreator>
        <dc:contributor>Scott D. Peckham</dc:contributor>
        <omv:documentation>http://csdms.colorado.edu/wiki/CSDMS_Standard_Names</omv:documentation>
        <omvmmi:origVocKeywords>CSDMS, standard names, structured, vocabularies compilation</omvmmi:origVocKeywords>
        <omvmmi:origMaintainerCode>csdms</omvmmi:origMaintainerCode>
        <omvmmi:temporaryMmiRole>ontology republisher</omvmmi:temporaryMmiRole>
        <omv:description>A complete collection constructed as an RDF resource from the individual vocabularies created for the CSDMS project.  This semantic work has a goal of automated semantic mediation, matching or reconciliation. While the focus is on a "lingua franca", the standard names are often built from a hierarchical set of concepts, and may eventually be used to construct a type of ontology.</omv:description>
        <omvmmi:origVocDescriptiveName>CSDMS Standard Names</omvmmi:origVocDescriptiveName>
        <omvmmi:creditCitation>(To be determined)</omvmmi:creditCitation>
        <omv:acronym>csdms-sn-mod</omv:acronym>
        <owl:imports rdf:resource="http://www.isi.edu/ikcap/geosoft/ontology/software.owl"/>
        <dc:description>A complete collection constructed as an RDF resource from the individual vocabularies created for the CSDMS project.  This semantic work has a goal of automated semantic mediation, matching or reconciliation. While the focus is on a "lingua franca", the standard names are often built from a hierarchical set of concepts, and may eventually be used to construct a type of ontology.</dc:description>
        <omv:creationDate>2015-01-29T00:09:34+0000</omv:creationDate>
        <dc:source>http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl</dc:source>
        <omvmmi:origVocSyntaxFormat>OWL</omvmmi:origVocSyntaxFormat>
        <omvmmi:origVocVersionId>(unknown -- not versioned?)</omvmmi:origVocVersionId>
        <omv:keywords>standard names, csdms, collection, test, ORR-tailored</omv:keywords>
        <omv:name>CSDMS Standard Names (Testing)</omv:name>
        <omvmmi:origVocManager>Scott D. Peckham</omvmmi:origVocManager>
        <omvmmi:hasResourceType>parameter</omvmmi:hasResourceType>
        <omv:reference>http://csdms.colorado.edu/mediawiki/images/Peckham_2014_iEMSs.pdf</omv:reference>
        <omvmmi:contactRole>content manager</omvmmi:contactRole>
        <dc:date>2015-01-29T00:09:34+0000</dc:date>
        <omvmmi:origVocUri>http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl</omvmmi:origVocUri>
        <dc:creator>John Graybeal</dc:creator>
        <omvmmi:origVocDocumentationUri>http://csdms.colorado.edu/wiki/CSDMS_Standard_Names</omvmmi:origVocDocumentationUri>
>

    </owl:Ontology>
carueda commented 9 years ago

Not sure about what may be going here. Perhaps using a tool like Protege would help at least in verifying the general expected behavior.

Re the ORR handling, the mix of fragment separators might be part of the problem, although it's not immediately obvious why it shouldn't work.

A general comment: This is likely an incorrect way to operate, but most (if not all) operations in the ORR are basically centered around the URI of the ontology itself, in particular for reporting of terms. That is, it only reports the terms under the namespace defined by the ontology URI. (I'll have to look into the code to confirm this behavior as I'm be forgetting the key details.)

graybeal commented 9 years ago

OK, thanks for taking a look -- knowing it isn't immediate obvious is a good start, actually.

I copied your 3rd paragraph to #340, I think it's directly applicable there and I want to respond there.

carueda commented 9 years ago
http://mmisw.org/ont/csdms/terms/air__dielectric_constant
http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl#air__dielectric_constant    
http://www.isi.edu/ikcap/geosoft/ontology/csdms/#air__dielectric_constant

So, according to the last entry, the # separator was not replaced but kept along with the / of the namespace. Seems like this is the bug.

graybeal commented 9 years ago

Yes, I want to be able to show Scott and team what different registration options look like, so they can consider the differences (registered remote-hosted as is; ditto, with some tweaks; and registered fully hosted; soon you'll see the same thing in a separate vocabulary, once I figure out how to tease out the different vocabularies). Didn't occur to me to use search to find the results, duh.

Yes to the bug. I wonder from where in the input ontology the system would think that it should use the # separator? (In other words: Maybe it's not a bug in what the repository does, but in what the ontology is saying to do. Somehow.)

carueda commented 9 years ago

By looking at the "diff" between the original header and your new header:

xmlns=
original "http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl#"
new "http://www.isi.edu/ikcap/geosoft/ontology/csdms/"
comment new namespace with trailing / looks good as you intend
xml:base=
original "http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl"
new "http://www.isi.edu/ikcap/geosoft/ontology/csdms/"
comment but the new xml:base should not have the trailing / *().**
owl:Ontology rdf:about=
original "http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl"
new "http://www.isi.edu/ikcap/geosoft/ontology/csdms"
comment looks good as you intend

So, the task now would be to repeat this exercise with the correct xml:base=.

*() Edit(2015-11-30): According to later comments, to further clarify that a value with trailing / is in general fine for xml:base but** we don't want that in this case (we only want / as the sole separator for the associated entities). In particular, the rdf:ID mechanism, along with xml:base would also include a # for the entity URIs).

carueda commented 8 years ago

Just uploaded http://mmisw.org/orr/#http://www.isi.edu/ikcap/geosoft/ontology/csdms2 and marked it as 'testing'. It has:

xmlns=    "http://www.isi.edu/ikcap/geosoft/ontology/csdms2/
xml:base= "http://www.isi.edu/ikcap/geosoft/ontology/csdms2"
...
<owl:Ontology rdf:about="http://www.isi.edu/ikcap/geosoft/ontology/csdms2">

(used csdsm2 in this case to keep the previous submission as it is)

Still to be examined.

carueda commented 8 years ago

Well, just noted that the entity URIs are still using the # separator.

For example, the expectation would be to have http://www.isi.edu/ikcap/geosoft/ontology/csdms2/air__dielectric_constant resolving, but it is not. With the submission as done, the URI here is actually http://www.isi.edu/ikcap/geosoft/ontology/csdms2#air__dielectric_constant

(which BTW can be resolved as:

or in the Portal with:

In any case, note the hash # separator. This is because we didn't replace the rdf:ID mechanism in the original ontology, see http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-ID-xml-base

So, I just fixed this as well and registered a third version (now using csdsm3 to keep the previous ones available for reference):

http://mmisw.org/orr/#http://www.isi.edu/ikcap/geosoft/ontology/csdms3

Now, the terms are resolvable as intended, for example, for http://www.isi.edu/ikcap/geosoft/ontology/csdms3/air__dielectric_constant,

carueda commented 8 years ago

In summary, to properly adjust the submission from using http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl and # as fragment separator:

<rdf:RDF xmlns="http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl#"
    xml:base="http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:ts="http://www.isi.edu/ikcap/geosoft/ontology/software.owl#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

    <owl:Ontology rdf:about="http://www.isi.edu/ikcap/geosoft/ontology/CSDMS.owl">
        <owl:imports rdf:resource="http://www.isi.edu/ikcap/geosoft/ontology/software.owl"/>
    </owl:Ontology>

    <!-- Objects -->
    <ts:Object rdf:ID="object_air">
        <rdfs:label>air</rdfs:label>
    </ts:Object>
...

to using http://www.isi.edu/ikcap/geosoft/ontology/csdms and / as fragment separator, the contents should be adjusted to become:

<rdf:RDF xmlns="http://www.isi.edu/ikcap/geosoft/ontology/csdms/"
    xml:base="http://www.isi.edu/ikcap/geosoft/ontology/csdms"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:ts="http://www.isi.edu/ikcap/geosoft/ontology/software.owl#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

    <owl:Ontology rdf:about="http://www.isi.edu/ikcap/geosoft/ontology/csdms">
        <owl:imports rdf:resource="http://www.isi.edu/ikcap/geosoft/ontology/software.owl"/>
    </owl:Ontology>

    <!-- Objects -->
    <ts:Object rdf:about="http://www.isi.edu/ikcap/geosoft/ontology/csdms/object_air">
        <rdfs:label>air</rdfs:label>
    </ts:Object>
...

Note:

carueda commented 8 years ago

To avoid loading unnecessary contents in the ORR I just unregistered the testing ontologies I used in the previous exercises: