owlcs / owlapi

OWL API main repository
821 stars 315 forks source link

Ontology imports not preserved by RDF/XML parser #1147

Open ykazakov opened 1 month ago

ykazakov commented 1 month ago

As reported in protegeproject/protege#1226 the imports of some (RDF) ontologies get flattened when loaded by the RDF/XML parser:

  1. Create an ontology that imports http://purl.org/dc/terms/
  2. Save it in RDF/XML syntax
  3. Load the ontology using OWL API
  4. Observe that the loaded ontology does not have imports

The same does not seem to happen when saving the ontology in functional-style syntax.

To reproduce in OWL API:

  1. Create two files with the following contents somewhere on the classpath

    1. test.rdf:
      <?xml version="1.0"?>
      <rdf:RDF xmlns="http://www.semanticweb.org/demo/ontologies/2024/6/untitled-ontology-136/"
           xml:base="http://www.semanticweb.org/demo/ontologies/2024/6/untitled-ontology-136/"
           xmlns:dc="http://purl.org/dc/elements/1.1/"
           xmlns:owl="http://www.w3.org/2002/07/owl#"
           xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
           xmlns:xml="http://www.w3.org/XML/1998/namespace"
           xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
           xmlns:dcam="http://purl.org/dc/dcam/"
           xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
           xmlns:terms="http://purl.org/dc/terms/">
          <owl:Ontology rdf:about="http://www.semanticweb.org/demo/ontologies/2024/6/untitled-ontology-136">
              <owl:imports rdf:resource="http://purl.org/dc/terms/"/>
          </owl:Ontology>
      </rdf:RDF>
    2. test.ofn:

      Prefix(:=<http://www.semanticweb.org/demo/ontologies/2024/6/untitled-ontology-136/>)
      Prefix(owl:=<http://www.w3.org/2002/07/owl#>)
      Prefix(rdf:=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
      Prefix(xml:=<http://www.w3.org/XML/1998/namespace>)
      Prefix(xsd:=<http://www.w3.org/2001/XMLSchema#>)
      Prefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)
      
      Ontology(<http://www.semanticweb.org/demo/ontologies/2024/6/untitled-ontology-136>
      Import(<http://purl.org/dc/terms/>)
      
      )
  2. Create the following tests:
    
    @Test
    public void testImportRDFXML() throws OWLOntologyCreationException {
    OWLOntologyManager man = OWLManager.createOWLOntologyManager();
    ClassLoader classLoader = getClass().getClassLoader();
    File f = new File(classLoader.getResource("test.rdf").getFile());
    OWLOntology o = man.loadOntologyFromOntologyDocument(f);
    assertEquals(1, o.getImports().size());
    }

@Test public void testImportFS() throws OWLOntologyCreationException { OWLOntologyManager man = OWLManager.createOWLOntologyManager(); ClassLoader classLoader = getClass().getClassLoader(); File f = new File(classLoader.getResource("test.ofn").getFile()); OWLOntology o = man.loadOntologyFromOntologyDocument(f); assertEquals(1, o.getImports().size()); }

3. Observe that the first test fails on the last line but the second test succeeds.
<details><summary>Stacktrace:</summary>

java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at org.semanticweb.owlapi.LoadImportTest.testImportRDFXML(LoadImportTest.java:21) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:93) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:40) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:529) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:757) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:452) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:210)


</details>

Versions: 
- OWL API: 4.5.29
- JDK: 11.0.23-zulu 
ignazio1977 commented 1 month ago

Trying to replicate this, I get something different:

static Stream<Arguments> languagesWithImports() {
    return Stream.of(Arguments.of(new FunctionalSyntaxDocumentFormat()),
        Arguments.of(new RDFXMLDocumentFormat()),
        Arguments.of(new TurtleDocumentFormat()),
        Arguments.of(new RioTurtleDocumentFormat()),
        Arguments.of(new ManchesterSyntaxDocumentFormat()),
        Arguments.of(new OWLXMLDocumentFormat()));
}

@ParameterizedTest
@MethodSource("languagesWithImports")
void shouldSaveAndLoadImport(OWLDocumentFormat format) throws OWLOntologyCreationException {
    OWLOntology o = create();
    IRI terms = IRI.create("http://purl.org/dc/terms/");
    o.getOWLOntologyManager().createOntology(terms);
    o.getOWLOntologyManager().applyChange(new AddImport(o, df.getOWLImportsDeclaration(terms)));
    assertEquals(1, o.getImports().size());
    OWLOntology o2 = roundTrip(o, format);
    equal(o, o2);
    assertEquals(1, o2.getImports().size());
}

In all cases, attempting to load the ontology throws unloadable ontology error.

I believe Protege is running with missing import strategy set to silent, no exceptions. The DCTerms ontology doesn't look like it can be loaded successfully. The options setup is so that the partial load adds data to the current ontology (same as if the import was from a piece of RDF and not an ontology).

Which perhaps is the case?

org.semanticweb.owlapi.model.OWLRuntimeException: org.semanticweb.owlapi.io.OWLOntologyCreationIOException: Server returned HTTP response code: 403 for URL: http://dublincore.org/specifications/dublin-core/dcmi-terms/dublin_core_terms# at org.semanticweb.owlapi.api.test.baseclasses.TestBase.loadOntology(TestBase.java:605) at org.semanticweb.owlapi.api.test.imports.ImportsTestCase.shouldSaveAndLoadImport(ImportsTestCase.java:328) at java.lang.reflect.Method.invoke(Method.java:498) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at java.util.ArrayList.forEach(ArrayList.java:1259) at java.util.ArrayList.forEach(ArrayList.java:1259) Caused by: org.semanticweb.owlapi.io.OWLOntologyCreationIOException: Server returned HTTP response code: 403 for URL: http://dublincore.org/specifications/dublin-core/dcmi-terms/dublin_core_terms# at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:230) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1303) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1243) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1143) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1097) at org.semanticweb.owlapi.api.test.baseclasses.TestBase.loadOntology(TestBase.java:603) ... 46 more Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: http://dublincore.org/specifications/dublin-core/dcmi-terms/dublin_core_terms# at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1973) at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1968) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1967) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1521) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1505) at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStreamFromContentEncoding(AbstractOWLParser.java:215) at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStream(AbstractOWLParser.java:123) at org.semanticweb.owlapi.io.AbstractOWLParser.getInputSource(AbstractOWLParser.java:299) at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser.parse(RDFXMLParser.java:70) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:220) ... 51 more Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: http://dublincore.org/specifications/dublin-core/dcmi-terms/dublin_core_terms# at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1917) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1505) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at org.semanticweb.owlapi.io.AbstractOWLParser.connect(AbstractOWLParser.java:148) at org.semanticweb.owlapi.io.AbstractOWLParser.connect(AbstractOWLParser.java:158) at org.semanticweb.owlapi.io.AbstractOWLParser.connect(AbstractOWLParser.java:158) at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStream(AbstractOWLParser.java:113) ... 54 more

ykazakov commented 1 month ago

Which version of Java do you use? With Java 8 I also got a 403 error. I think you need to use a more recent Java version.

ignazio1977 commented 1 month ago

OWLAPI 4 is required to be compatible with Java 8, though.

Different behaviour with different Java versions is an added headache :-(

Worth noting that adding the import didn't work with any format, in default config, if an ontology with the right IRI was not already in the manager, and worked in all cases otherwise. I need to try with alternate configs, but I believe the root of the behaviour is that we're not dealing with an OWL ontology.

ignazio1977 commented 1 month ago
private OWLOntologyManager setupImportsStrategyManager(
    OWLOntologyManager man) {
    man.setOntologyLoaderConfiguration(man
        .getOntologyLoaderConfiguration()
        .setMissingImportHandlingStrategy(
            MissingImportHandlingStrategy.SILENT)
        .setMissingOntologyHeaderStrategy(
            MissingOntologyHeaderStrategy.IMPORT_GRAPH));
    return man;
}

With the manager set like this, the imports directives on file are saved correctly even when the imported file cannot be loaded. I believe that's the standard setup for Protege.

ignazio1977 commented 1 month ago

Running on Java 21, I get no exceptions and the tests pass.

The output for different formats:

FunctionalSyntaxDocumentFormat 
Before:
Prefix(:=<http://www.semanticweb.org/owlapi/test1#>)[standard prefixes]
Ontology(<http://www.semanticweb.org/owlapi/test1> Import(<http://purl.org/dc/terms/>))

After:
Prefix(:=<http://www.semanticweb.org/owlapi/test1#>)[standard prefixes]
Ontology(<http://www.semanticweb.org/owlapi/test1>
Import(<http://purl.org/dc/terms/>)

Declaration(Class(<http://purl.org/dc/dcam/VocabularyEncodingScheme>))...141 declarations from purl.org
Declaration(Class(rdfs:Class))
Declaration(Datatype(xsd:date))
)

RDFXMLDocumentFormat
Before:
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.semanticweb.org/owlapi/test2#"
     xml:base="http://www.semanticweb.org/owlapi/test2"[standard prefixes]>
    <owl:Ontology rdf:about="http://www.semanticweb.org/owlapi/test2">
        <owl:imports rdf:resource="http://purl.org/dc/terms/"/>    </owl:Ontology>
</rdf:RDF>

After:
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.semanticweb.org/owlapi/test2#"
     xml:base="http://www.semanticweb.org/owlapi/test2"[standard prefixes]
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:dcam="http://purl.org/dc/dcam/"
     xmlns:terms="http://purl.org/dc/terms/">
    <owl:Ontology rdf:about="http://www.semanticweb.org/owlapi/test2">
        <owl:imports rdf:resource="http://purl.org/dc/terms/"/>    </owl:Ontology>
</rdf:RDF>

TurtleDocumentFormat
Before:
@prefix : <http://www.semanticweb.org/owlapi/test3#> .[standard prefixes]
@base <http://www.semanticweb.org/owlapi/test3> .

<http://www.semanticweb.org/owlapi/test3> rdf:type owl:Ontology ;
                                           owl:imports <http://purl.org/dc/terms/> .

After:
@prefix : <http://www.semanticweb.org/owlapi/test3#> .[standard prefixes]
@base <http://www.semanticweb.org/owlapi/test3> .

<http://www.semanticweb.org/owlapi/test3> rdf:type owl:Ontology ;
                                           owl:imports <http://purl.org/dc/terms/> .

###  Generated by the OWL API (version 4.5.29) https://github.com/owlcs/owlapi

RioTurtleDocumentFormat
Before:
@prefix : <http://www.semanticweb.org/owlapi/test4#> .[standard prefixes]
<http://www.semanticweb.org/owlapi/test4> a owl:Ontology;
  owl:imports <http://purl.org/dc/terms/> .

After:
@prefix : <http://www.semanticweb.org/owlapi/test4#> .[standard prefixes]
<http://www.semanticweb.org/owlapi/test4> a owl:Ontology;
  owl:imports <http://purl.org/dc/terms/> .

ManchesterSyntaxDocumentFormat
Before:
[standard prefixes]
Prefix: : <http://www.semanticweb.org/owlapi/test5>
Ontology: <http://www.semanticweb.org/owlapi/test5>
Import: <http://purl.org/dc/terms/>

After:
[standard prefixes]
Prefix: : <http://www.semanticweb.org/owlapi/test5>
Ontology: <http://www.semanticweb.org/owlapi/test5>
Import: <http://purl.org/dc/terms/>

OWLXMLDocumentFormat
Before:
<?xml version="1.0"?>
<Ontology xmlns="http://www.w3.org/2002/07/owl#"
     xml:base="http://www.semanticweb.org/owlapi/test6"[standard prefixes]
     ontologyIRI="http://www.semanticweb.org/owlapi/test6">
[standard prefixes]
    <Import>http://purl.org/dc/terms/</Import>
</Ontology>

After:
<?xml version="1.0"?>
<Ontology xmlns="http://www.w3.org/2002/07/owl#"
     xml:base="http://www.semanticweb.org/owlapi/test6"[standard prefixes]
     ontologyIRI="http://www.semanticweb.org/owlapi/test6">
[standard prefixes]
    <Import>http://purl.org/dc/terms/</Import>
    <Declaration>        <Class abbreviatedIRI="rdfs:Class"/>    </Declaration>
    <Declaration>        <Datatype abbreviatedIRI="xsd:date"/>    </Declaration>
    <Declaration>        <AnnotationProperty IRI="http://purl.org/dc/terms/description"/>    </Declaration>...141 declarations from purl.org
</Ontology>

Functional and OWL/XML include the axioms inline, while none of the others do. Looks like they're ignoring the INCLUDE_GRAPH directive. I'm uncertain if that's a bug or not, given the fact that I don't believe the specs say a lot about it. I'd prefer uniform behaviour, though.

As to why the behaviour is different between Java 8 and Java 21, I'm guessing different exceptions are getting raised :-( On Java 8, the exception is uncaught, while it should have been caught and ignored as per settings.

ignazio1977 commented 1 month ago

With the default manager settings, the test fails - the reason being that the import directive cannot be resolved and so gets ignored. I'd argue that the behaviour on Java 8 is best, as it signals something is amiss instead of allowing to load only part of the ontology - the imports directive would be lost once one saves the ontology again, without the user any the wiser.

ykazakov commented 1 month ago

Thanks for investigating!

OWLAPI 4 is required to be compatible with Java 8, though.

Different behaviour with different Java versions is an added headache :-(

As far as I understand, the 403 error is a separate issue caused by the server blocking requests if it does not like the headers (e.g., coming from Java 8). This could be easily resolved by setting a custom user agent, which is a good idea anyway to ensure consistency across different environments:

@Before 
public void init() {
    System.setProperty("http.agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36");
}

(In case you are wondering, I just took the most popular user agent.)

Alternatively, you can use (a different version of) the same ontology from bioportal. Just change the URL "http://purl.org/dc/terms/" used within test.ofn and test.rdf to:

"https://data.bioontology.org/ontologies/DCT/submissions/1/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb"

For me, both versions of tests (with user agent or with updated URL) work consistently (one test failing the other is not) across all the version of Java I have:

Running on Java 21, I get no exceptions and the tests pass.

That is strange. As I said, I obtain the same results with all versions of java. Can you tell the exact version of Java and the OS that you use? I run tests from Eclipse 2024-06 (4.32.0) Build id: 20240606-1231 from MacOS 14.5 (23F79)

Worth noting that adding the import didn't work with any format, in default config, if an ontology with the right IRI was not already in the manager, and worked in all cases otherwise.

What do you mean by "adding the import"? Programmatically using OWL API?