owlcs / owlapi

OWL API main repository
822 stars 315 forks source link

Unable to load axioms for ontology in versions 4.5.17 and above #1045

Closed jvendetti closed 2 years ago

jvendetti commented 2 years ago

Hi Ignazio. I'm trying to get the attached ontology loaded into the BioPortal application. We're using version 4.5.18 of the OWL API. The following code snippet shows that the OWL API reports zero axioms for the ontology:

@Test
public void testLoad_WoodyPlantOntology_Submission58() throws Exception {
  File file = new File("src/test/resources/co_357.rdf");
  FileDocumentSource fileDocumentSource = new FileDocumentSource(file);
  OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
  OWLOntology ontology = manager.loadOntologyFromOntologyDocument(fileDocumentSource);
  assertNotNull(ontology);
  System.out.println(ontology.getAxiomCount());
}

I tried loading the same ontology in Protege Desktop, which uses version 4.5.9 and noticed that the axiom count is reported as 10,585. I tested all versions of the OWL API in between and see that as of 4.5.17, the axiom count starts getting reported as zero. I also thought there might be something wrong with the RDF in this ontology, so I tried loading it with Apache Jena as another test (version 4.4.0, their latest):

@Test
public void testLoad_WoodyPlantOntology_Submission58() {
  String inputFileName = "src/test/resources/co_357.rdf";
  Model model = ModelFactory.createDefaultModel();
  InputStream in = RDFDataMgr.open(inputFileName);
  if (in == null) {
    throw new IllegalArgumentException("File: " + inputFileName + " not found");
  }
  model.read(in, null);
  System.out.println(model.size());
}

There are no errors during the read and Jena reports a model size of 18,380. It's unclear to me if there's something malformed in the RDF of this ontology, or if there's a bug in the OWL API that prevents detection of the axioms. Any assistance would be much appreciated.

co_357.rdf.zip

ignazio1977 commented 2 years ago

That rang a bell, I might have seen this file or one like it before.

Reading the file as is produces indeed an empty ontology. When that's happened in the past, it's often been because one format parser didn't throw exceptions even if it should have, so the ontology is parsed with an unrelated format. In this case, it was TriX - not the first time, I thought I had banned it from the automatically attempted formats.

Two possible workarounds for this: explicitly ban the TriX parser (e.g., setting the environment variable

org.semanticweb.owlapi.model.parameters.ConfigurationOptions.BANNED_PARSERS=org.semanticweb.owlapi.rio.RioTrixParserFactory

(or equivalent methods such as: String name = "org.semanticweb.owlapi.rio.RioTrixParserFactory"; manager.setOntologyLoaderConfiguration(manager.getOntologyLoaderConfiguration().setBannedParsers(name)); )

Or you can specify the format, if known:

    FileDocumentSource fileDocumentSource = new FileDocumentSource(file, new RDFXMLDocumentFormat());

In this case, with the above I get:

Parser: org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser@77307458
Stack trace:
org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParserException: [line=12956:column=92] IRI 'https://cropontology.org/rdf/CO_357:3000044/[fork%20height]' cannot be resolved against current base IRI file:/Users/ignazio/Downloads/co_357.rdf reason is: Illegal character in path at index 44: https://cropontology.org/rdf/CO_357:3000044/[fork%20height]        org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser.parse(RDFXMLParser.java:74)
    uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:220)
    uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1303)
    uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1243)

The square brackets aren't to URI.create() pleasure.

ignazio1977 commented 2 years ago

The parser banning can be done with environment variables(VM level), manager level as seen above, ontology level by creating the ontology with a specific OntologyLoaderConfiguration object, or at library level by including the value in a property file named owlapi.properties on the classpath (see for example the one included in the contract subproject)

ignazio1977 commented 2 years ago

I can't recall any code change that caused this but it might just have been a consequence of a version update on the RIO libraries.

jvendetti commented 2 years ago

OMG, square brackets. Thanks very much for the help Ignazio. I'll repair the illegal characters in the ontology source file, and also modify our code to ban the TriX parser.