ncbo / owlapi_wrapper

A command line utility that wraps the Java OWL-API to parse RDFS, OWL and OBO ontologies.
5 stars 9 forks source link

update owlapi to v4.5.24 #21

Open alexskr opened 1 year ago

alexskr commented 1 year ago

Protege v5.6.0 and ROBOT are using OWLAPI 4.5.24

jvendetti commented 1 year ago

It's 4.5.25 now:

Yesterday, ROBOT 1.9.3 was released. Today, we’ve released the ODK 1.4 and Protégé 5.6.1. What those releases have all in common is that they all upgrade to the same new version of the OWL API, 4.5.25 (also released yesterday).

jvendetti commented 1 year ago

I upgraded the owlapi-distribution dependency to 4.5.25 in my local dev environment, used mvn package to generate a new JAR, and ran the unit test suites for several projects.

Things look largely fine with the tests in this repository, as well as the ontologies_linked_data project. However, there's a unit test in the ontologies_api project (TestClassesController#test_notation_lookup) that's failing and it's less obvious what's going wrong.

This is the API call the unit test issues:

/ontologies/TEST-ONT-0/classes/BRO:0000001?include=all

... which results in a 200 OK status when using owlapi-distribution 4.5.18. With version 4.5.25, the same call results in a 404 status code:

{"errors":["Resource 'BRO:0000001' not found in ontology TEST-ONT-0 submission 2"],"status":404}

More specifically, this code in classes_helper.rb:

notation_lookup = LinkedData::Models::Class.where(
  notation: RDF::Literal.new(params[:cls], :datatype => RDF::XSD.string))
  .in(submission).first

... will successfully fetch the requested class from the triplestore with owlapi-distribution 4.5.18, but not 4.5.25.

jvendetti commented 1 year ago

Further investigation revealed what I think is the underlying cause for the test_notation_lookup unit test failure. In version 4.5.22 of the OWL API, they started stripping datatypes like xsd:string from output files:

Most syntaxes do not require xsd:string to be outputted explicitly for literals without language tags.

See issues https://github.com/owlcs/owlapi/issues/1063 and https://github.com/owlcs/owlapi/issues/640 in the OWL API issue tracker for detailed explanations/discussion.

When we ingest new ontologies from end users, we load them, add some annotation axioms, then serialize the ontologies to RDF/XML format. The output files we generate look different between versions, i.e.:

# This is the serialization of the BRO:0000001 class generated with OWL API 4.5.18

<!-- http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#Material_Resource -->

<owl:Class rdf:about="http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#Material_Resource">
    <rdfs:subClassOf rdf:resource="http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#Resource"/>
    <biositemap:definition rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A resource that provides items such as reagents, instruments, tissue samples or organisms.</biositemap:definition>
    <metadata:prefixIRI rdf:datatype="http://www.w3.org/2001/XMLSchema#string">BRO:Material_Resource</metadata:prefixIRI>
    <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">As per alignment with NIF resource type hierarchy.</rdfs:comment>
    <skos:notation rdf:datatype="http://www.w3.org/2001/XMLSchema#string">BRO:0000001</skos:notation>
    <skos:prefLabel rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Material Resource</skos:prefLabel>
</owl:Class>


# This is the serialization of the BRO:0000001 class generated with OWL API 4.5.25
# Notice the shortened syntax with unnecessary datatypes removed.

<!-- http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#Material_Resource -->

<owl:Class rdf:about="http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#Material_Resource">
    <rdfs:subClassOf rdf:resource="http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#Resource"/>
    <biositemap:definition>A resource that provides items such as reagents, instruments, tissue samples or organisms.</biositemap:definition>
    <metadata:prefixIRI>BRO:Material_Resource</metadata:prefixIRI>
    <rdfs:comment>As per alignment with NIF resource type hierarchy.</rdfs:comment>
    <skos:notation>BRO:0000001</skos:notation>
    <skos:prefLabel>Material Resource</skos:prefLabel>
</owl:Class>

Revisiting the code mentioned in my comment from yesterday:

notation_lookup = LinkedData::Models::Class.where(
  notation: RDF::Literal.new(params[:cls], :datatype => RDF::XSD.string))
  .in(submission).first

This lookup fails because it requires the presence of the xsd:string datatype.

Ignazio added code to the OWL API that allows a user of the API to force the presence of xsd:string on literals. See this comment: https://github.com/owlcs/owlapi/issues/1063#issuecomment-1249911355. One possible solution to test would be to modify this project to use the new format parameter:

format.setParameter("force xsd:string on literals", Boolean.TRUE);

Another possible solution would be to remove the xsd:string datatype parameter from the notation_to_class_uri method in the ontologies_api project. It's not immediately clear to me if that would have any other adverse effects.

jvendetti commented 1 year ago

I tested a modification to the code in the notation_to_class_uri method in my local dev environment where I removed the datatype parameter for creation of a new RDF::Literal object:

notation_lookup = LinkedData::Models::Class.where(notation: RDF::Literal.new(params[:cls])).in(submission).first

I ran the full unit test suite for ontologies_api, and this change doesn't break any other unit tests. I think this might be the better solution because it would allow us to have smaller output files when we serialize ontologies to RDF/XML.