ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
261 stars 74 forks source link

ROBOT query --update strips prefixes when output is not .owl (RDF/XML) #1172

Open allenbaron opened 11 months ago

allenbaron commented 11 months ago

The robot query command performing an update operation drops prefixes when the output is .ofn or .omn but not .owl (all I tested). This seems to be the same issue as #1101 except it's happening for robot update queries and wasn't fixed by PR https://github.com/ontodev/robot/pull/1106 (still happens in 1.9.5). The doid-edit.owl input file is formatted as .ofn. This happens for all SPARQL update queries I've tried (including a completely empty one, see bottom).

Prefixes dropped

.ofn output loses prefixes:

robot query -i doid-edit.owl --update fix_whitespace.rq -o tmp.ofn \
    && mv tmp.ofn doid-edit.owl

Chaining convert doesn't help:

robot \
    query -i doid-edit.owl --update fix_whitespace.rq \
    convert -o tmp.ofn \
    && mv tmp.ofn doid-edit.owl

Separate convert doesn't help (for .ofn or .omn):

robot query -i doid-edit.owl --update fix_whitespace.rq -o tmp.omn \
    && robot convert -i tmp.omn -o doid-edit.owl --format ofn \
    && rm tmp.omn

Result:

image

Prefixes Preserved

.owl output preserves prefixes:

robot query -i doid-edit.owl --update fix_whitespace.rq -o tmp.owl \
    && robot convert -i tmp.owl -o doid-edit.owl --format ofn \
    && rm tmp.owl

Using --add-prefixes also works (my current workaround):

robot --add-prefixes prefixes.json \
    query -i doid-edit.owl --update fix_whitespace.rq -o tmp.ofn \
    && mv tmp.ofn doid-edit.owl

SPARQL queries

fix_whitespace.rq: ```sparql # remove extra whitespace from ALL strings (e.g. in defs, xrefs, labels, etc.) # -> removes 2+ spaces, spaces before commas or periods, and spaces at beginning or end of string PREFIX xsd: DELETE { ?s ?p ?o . } INSERT { ?s ?p ?new_o . } WHERE { ?s ?p ?o . FILTER( datatype(?o) = xsd:string ) BIND( REPLACE( REPLACE(?o, " (,) *| +", "$1 "), " (\\.)| +$|^ +", "$1" ) AS ?new_o ) } ``` empty sparql update query: ```sparql DELETE { } INSERT { } WHERE { ?s a owl:Class . } ```

prefixes.json file

```json { "@context": { "obo": "http://purl.obolibrary.org/obo/", "oboInOwl": "http://www.geneontology.org/formats/oboInOwl#", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "xml": "http://www.w3.org/XML/1998/namespace", "xsd": "http://www.w3.org/2001/XMLSchema#", "owl": "http://www.w3.org/2002/07/owl#", "terms": "http://purl.org/dc/terms/", "dc": "http://purl.org/dc/elements/1.1/", "skos": "http://www.w3.org/2004/02/skos/core#", "doid": "http://purl.obolibrary.org/obo/doid#" } } ```
jamesaoverton commented 11 months ago

Thanks for pointing to #1106, which uses isPrefixOWLOntologyFormat() to check whether a format should use prefixes. That should be correct. In this case robot query is converting the input ontology to Turtle, loading into Jena, running SPARQL, converting back to Turtle, and reading in to OWLAPI again. I guess that the format of the input ontology is being lost. If I'm right, then the prefixes won't be preserved for RDFXML format either, but we might be setting decent prefixes in that case.

Do you (or anyone reading this) have time to dig into this issue? I have some big deadlines coming up.

allenbaron commented 10 months ago

I'd love to help more but I don't have sufficient expertise with Java (or sufficient familiarity with the internal workings of ROBOT/OWLAPI) to delve into this. My apologies.

matentzn commented 2 months ago

Seems @souzadevinicius is interested to look at this, but its actually a quite complex issue possibly - we will see.