ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

AIO: latest submission failed to process - status "Uploaded, Error Rdf" #261

Closed jvendetti closed 1 year ago

jvendetti commented 1 year ago

Received report from an end user on the support list that the latest version of their AIO ontology failed to process (submission ID 4). The summary page on BioPortal is showing statuses of "Uploaded, Error Rdf".

Production log file indicates that the OWL API wasn't able to load the ontology:

Error: OWL_PARSE_EXCEPTION
Message: Problem parsing file:/srv/ncbo/repository/AIO/4/aio-full.owl
Could not parse ontology.  Either a suitable parser could not be found, or parsing failed.  See parser logs below for explanation.

Error is reproducible with the following snippet of test code:

@Test
public void testLoad_AIO_Ontology_WithDocumentFormat() throws Exception {
  String path = "src/test/resources/aio-full.owl";
  FileDocumentSource fileDocumentSource = new FileDocumentSource(new File(path), new RDFXMLDocumentFormat());
  OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
  OWLOntology ontology = manager.loadOntologyFromOntologyDocument(fileDocumentSource);
  assertNotNull(ontology);
}

Relevant stack trace from the OWL API shows that the ontology source file contains illegal characters:

Illegal character in path at index 50: https://w3id.org/aio/Unsupervised_Block_Clustering|Unsupervised_Co-clustering|Unsupervised_Unsupervised_Two-mode_Clustering|Unsupervised_Two-way_Clustering|Unsupervised_Joint_Clustering
  java.base/java.net.URI$Parser.fail(URI.java:2913)
  java.base/java.net.URI$Parser.checkChars(URI.java:3084)
  java.base/java.net.URI$Parser.parseHierarchical(URI.java:3166)
  java.base/java.net.URI$Parser.parse(URI.java:3114)
  java.base/java.net.URI.<init>(URI.java:600)
  java.base/java.net.URI.create(URI.java:881)
  java.base/java.net.URI.resolve(URI.java:1066)
  org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParser.resolveFromDelegate(RDFParser.java:277)
  org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParser.resolveIRI(RDFParser.java:346)
  org.semanticweb.owlapi.rdf.rdfxml.parser.NodeElement.getIDNodeIDAboutResourceIRI(StartRDF.java:340)
  at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:257)
  at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1288)
  at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1228)
  at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:1179)
jvendetti commented 1 year ago

The problematic characters are pipes and appear in the following class declaration on lines 5098 and 5106:

<owl:Class rdf:about="https://w3id.org/aio/Unsupervised_Block_Clustering|Unsupervised_Co-clustering|Unsupervised_Unsupervised_Two-mode_Clustering|Unsupervised_Two-way_Clustering|Unsupervised_Joint_Clustering">
  <rdfs:subClassOf rdf:resource="https://w3id.org/aio/Machine_Learning"/>
  <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">An algorithm to group objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors</obo:IAO_0000115>
  <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">K-NN</oboInOwl:hasExactSynonym>
  <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">KNN</oboInOwl:hasExactSynonym>
  <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Unsupervised Block Clustering|Unsupervised Co-clustering|Unsupervised Unsupervised Two-mode Clustering|Unsupervised Two-way Clustering|Unsupervised Joint Clustering</rdfs:label>
</owl:Class>
<owl:Axiom>
  <owl:annotatedSource rdf:resource="https://w3id.org/aio/Unsupervised_Block_Clustering|Unsupervised_Co-clustering|Unsupervised_Unsupervised_Two-mode_Clustering|Unsupervised_Two-way_Clustering|Unsupervised_Joint_Clustering"/>
  <owl:annotatedProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000115"/>
  <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">An algorithm to group objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors</owl:annotatedTarget>
  <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm</oboInOwl:hasDbXref>
</owl:Axiom>
jvendetti commented 1 year ago

I've followed up with the end user with a request to remove the illegal pipe characters, and attempt a re-submission (https://mailman.stanford.edu/pipermail/bioontology-support/2022-November/012916.html).

jvendetti commented 1 year ago

New version was uploaded to BioPortal on Nov. 15th (submission ID 5), and processed successfully.