ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
264 stars 74 forks source link

Extract produces strangeness -> OBO Graph error #1184

Closed caufieldjh closed 8 months ago

caufieldjh commented 8 months ago

Hi robotniks. I'm facing a challenge in converting OWL to obojson, but only for an extracted subset. Starting with phenio.owl, I remove any problematic annotations (like "<rdfs:comment></rdfs:comment>") then use robot convert to get the corresponding json. This works without issue. But when I start with phenio-test.owl, which is the result of robot extract --method MIREOT --input phenio.owl --branch-from-term "UPHENO:0084945" --output phenio-test.owl, I get an OBO GRAPH ERROR. Could there be some annotation error introduced in the course of the extraction? How could I find out where it is, if so?

One bit of weirdness - the extracted product has a mangled header. It's missing the whole owl:Ontology block.

This is what the phenio.owl header looks like:

<?xml version="1.0"?>
<rdf:RDF xmlns="http://purl.obolibrary.org/obo/phenio.owl#"
     xml:base="http://purl.obolibrary.org/obo/phenio.owl"
     xmlns:cl="http://purl.obolibrary.org/obo/cl#"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:go="http://purl.obolibrary.org/obo/go#"
     xmlns:pr="http://purl.obolibrary.org/obo/pr#"
     xmlns:eco="http://purl.obolibrary.org/obo/eco#"
     xmlns:nbo="http://purl.obolibrary.org/obo/nbo.owl#"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:zfa="http://purl.obolibrary.org/obo/zfa#"
     xmlns:cito="http://purl.org/spar/cito/"
     xmlns:core="http://purl.obolibrary.org/obo/uberon/core#"
     xmlns:doap="http://usefulinc.com/ns/doap#"
     xmlns:fbbt="http://purl.obolibrary.org/obo/fbbt#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:fypo="http://purl.obolibrary.org/obo/fypo#"
     xmlns:pato="http://purl.obolibrary.org/obo/pato#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:skos="http://www.w3.org/2004/02/skos/core#"
     xmlns:swrl="http://www.w3.org/2003/11/swrl#"
     xmlns:chebi="http://purl.obolibrary.org/obo/chebi/"
     xmlns:mondo="http://purl.obolibrary.org/obo/mondo#"
     xmlns:sssom="https://w3id.org/sssom/"
     xmlns:swrla="http://swrl.stanford.edu/ontologies/3.3/swrla.owl#"
     xmlns:swrlb="http://www.w3.org/2003/11/swrlb#"
     xmlns:terms="http://www.geneontology.org/formats/oboInOwl#http://purl.org/dc/terms/"
     xmlns:vocab="https://w3id.org/semapv/vocab/"
     xmlns:chebi3="http://purl.obolibrary.org/obo/chebi#"
     xmlns:chebi4="http://purl.obolibrary.org/obo/chebi#2"
     xmlns:chebi5="http://purl.obolibrary.org/obo/chebi#3"
     xmlns:chebi6="http://purl.obolibrary.org/obo/chebi#1"
     xmlns:hsapdv="http://purl.obolibrary.org/obo/hsapdv#"
     xmlns:linkml="https://w3id.org/linkml/"
     xmlns:terms2="http://purl.org/dc/terms/"
     xmlns:ubprop="http://purl.obolibrary.org/obo/ubprop#"
     xmlns:vocab1="https://w3id.org/biolink/vocab/"
     xmlns:subsets="http://purl.obolibrary.org/obo/ro/subsets#"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#"
     xmlns:Wikipedia="http://purl.obolibrary.org/obo/Wikipedia#"
     xmlns:ncbitaxon="http://purl.obolibrary.org/obo/ncbitaxon#"
     xmlns:caloha-reqs-vetted="http://purl.obolibrary.org/obo/caloha-reqs-vetted#">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/phenio.owl">
        <owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/phenio/releases/2024-03-04/phenio.owl"/>
        <terms2:description>None</terms2:description>
        <terms2:license rdf:resource="https://creativecommons.org/licenses/unspecified"/>
        <terms2:title>Phenomics Integrated Ontology</terms2:title>
        <owl:versionInfo>2024-03-04</owl:versionInfo>
    </owl:Ontology>

And here's phenio-test.owl:

<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.w3.org/2002/07/owl#"
     xml:base="http://www.w3.org/2002/07/owl"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:terms="http://purl.org/dc/terms/"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <Ontology/>

This is all with v1.9.5.

@matentzn

caufieldjh commented 8 months ago

The header appears to be the main issue. If I swap in the phenio.owl header into the header for the extracted subset, I get no errors from the obojson conversion.

balhoff commented 8 months ago

@caufieldjh the Ontology element in the second snippet looks okay. It is an empty element defining a blank node with rdf:type owl:Ontology. It should be fine for an anonymous ontology. Maybe ROBOT convert for OBO JSON doesn't like ontologies without ontology IRIs? You can add one by putting an annotate step within your ROBOT command.

balhoff commented 8 months ago
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.w3.org/2002/07/owl#"
     xml:base="http://www.w3.org/2002/07/owl"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:terms="http://purl.org/dc/terms/"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <Ontology/>
 </rdf:RDF>

translates to this Turtle:

_:Bcaf351a4X2DfbbfX2D465cX2Db7e5X2Dc4aa153dec17 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Ontology> .
caufieldjh commented 8 months ago

I played around with this a bit:

This converts from owl to obojson without issue.

<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.w3.org/2002/07/owl#"
     xml:base="http://www.w3.org/2002/07/owl"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:terms="http://purl.org/dc/terms/"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/phenio.owl">
    </owl:Ontology>

This does not - it raises a INVALID ONTOLOGY FILE ERROR.

<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.w3.org/2002/07/owl#"
     xml:base="http://www.w3.org/2002/07/owl"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:terms="http://purl.org/dc/terms/"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/phenio.owl">
    <Ontology/>

Omitting the <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/phenio.owl"> entirely raises a OBO GRAPH ERROR as above.

Should including the annotate replace <Ontology/> with </owl:Ontology>?

balhoff commented 8 months ago

Your second snippet is malformed XML. You can either use <owl:Ontology>stuff</owl:Ontology> or <Ontology>stuff</Ontology> (the second works because of the line xmlns="http://www.w3.org/2002/07/owl#"). Or if it has no properties, <owl:Ontology/> or <Ontology/>. You must have one ontology element of some kind for it to be an OWL ontology.

It's much easier to work with OWL functional syntax, or possibly Turtle if you need an RDF syntax. With these you have a better chance of seeing what elements are in your ontology. With RDF/XML you are simultaneously dealing with XML wellformedness along with the very complicated XML-to-RDF mapping. RDF/XML is basically the worst format.

matentzn commented 8 months ago

I have had that issue as well and made a corresponding obographs issue here: https://github.com/geneontology/obographs/issues/100

As this is not a ROBOT issue after all, lets move discussions there.