ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
264 stars 74 forks source link

Could not parse ontology #1150

Closed xyuhuang closed 1 year ago

xyuhuang commented 1 year ago

I wanted to extract an ontology using method MIREOT and I met this error:

org.semanticweb.owlapi.model.UnloadableImportException: Could not load imported ontology: http://www.opengis.net/ont/geosparql Cause: Problem parsing http://www.opengis.net/ont/geosparql

Could not parse ontology. Either a suitable parser could not be found, or parsing failed. See parser logs below for explanation. The following parsers were tried: 1) org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser@55ea2d70 2) org.semanticweb.owlapi.owlxml.parser.OWLXMLParser@c81fd12 3) org.semanticweb.owlapi.functional.parser.OWLFunctionalSyntaxOWLParser@58399d82 4) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory@95fd655c 5) org.semanticweb.owlapi.manchestersyntax.parser.ManchesterOWLSyntaxOntologyParser@64bfd6fd 6) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.NQuadsDocumentFormatFactory@6f9c39ad 7) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFJsonDocumentFormatFactory@cd748dc3 8) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.TrigDocumentFormatFactory@27e81c 9) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.NTriplesDocumentFormatFactory@937ecd36 10) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.BinaryRDFDocumentFormatFactory@3bf24493 11) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFJsonLDDocumentFormatFactory@dcacc47d 12) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioRDFXMLDocumentFormatFactory@69b9a3bc 13) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.N3DocumentFormatFactory@9a5 14) org.semanticweb.owlapi.rdf.turtle.parser.TurtleOntologyParser@47ec7422 15) org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser@5cbd159f 16) org.semanticweb.owlapi.krss2.parser.KRSS2OWLParser@20e6c4dc 17) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFaDocumentFormatFactory@264e8d

gouttegd commented 1 year ago

Hi,

Could you provide more details? Which ontology are you trying to use here? Which command did you use exactly? With which version of ROBOT?

I can see no detectable problem with http://www.opengis.net/ont/geosparql. The latest version of ROBOT can parse it without any issue, so whatever problem you ran into may not come from it but maybe from the ontology it is imported into. Which ontology is that?

xyuhuang commented 1 year ago

@gouttegd hi, I'm trying to extract linkedgeodata ontology here. I used this command: robot extract --method MIREOT --input lgd.owl --lower-term "http://linkedgeodata.org/ontology/building" --lower-term "http://linkedgeodata.org/ontology/ApartmentBuilding" --output results/lgd_out2.owl

the thing is, I've run it successfully using the same command, but suddenly when I re-run it he gets an error.

gouttegd commented 1 year ago

I can reproduce the issue here.

What is weird is that ROBOT has no trouble loading the geosparql import when it is stored on disk, but it fails when it tries to fetch it from the Internet (and it is a parsing error, so it’s not a network problem or anything like that). Very strange!

But at least that gives you a workaround:

  1. Download the geosparql file once and for all:
$ wget -O geosparql.ttl http://www.opengis.net/ont/geosparql

Make sure the geosparql.ttl file is in the same directory as the one containing lgd.owl.

  1. Then, in that same directory, create a catalog file with the following contents:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <group xml:base="">
        <uri name="http://www.opengis.net/ont/geosparql" uri="geosparql.ttl"/>
    </group>
</catalog>
  1. When you invoke ROBOT, pass it the name of the catalog file. For example, assuming you’ve named the file catalog.xml:
$ robot --catalog catalog.xml extract --method MIREOT --input lgd.owl [... then the rest of your command ...]

This way, ROBOT will not try to fetch the geosparql component from the Internet, but instead use the local copy you have previously downloaded.

(Now why ROBOT fails to parse that component when it downloads it from the Internet is still a complete mystery to me.)

matentzn commented 1 year ago

Wild guess: OWLAPI requires a file extension to determine the nature of the import? Alternatively, some ideosyncratic hosting infrastructure for the file? In any case, if @xyuhuang you want a more permanent solution, you can try to make an issue here https://github.com/owlcs/owlapi.

gouttegd commented 1 year ago

No, the OWL API still tries all its parsers one after the other (as it always does), so with or without a file extension it should still be able to find a parser that correctly parses the file. (In the workaround above I just added an extension because I don’t like extension-less files!)

You can easily test by trying to load http://www.opengis.net/ont/geosparql directly with ROBOT:

$ robot convert -I http://www.opengis.net/ont/geosparql -o geosparql.ofn

It will fail, but you’ll see in the output that all parsers were tried as expected. With or without a file extension, one of them should have managed to successfully parse the ontology.

gouttegd commented 1 year ago

My own wild guess is more about a misconfigured server. This is what http://www.opengis.net/ont/geosparql initially returns:

HTTP/1.1 302 Moved Temporarily
Date: Wed, 20 Sep 2023 08:13:56 GMT
Server: 1060 NetKernel v3.3 - Powered by Jetty
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Type: text/html; charset=iso-8859-1
Content-Length: 299
Location: https://opengeospatial.github.io/ogc-geosparql/geosparql11/geo.ttl
X-Purl: 2.0; http://localhost:8080
Vary: Accept-Encoding

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML>
    <HEAD>
        <TITLE>302 Found</TITLE>
    </HEAD>
    <BODY>
    <H1>Found</H1>
         The resource requested is available <A HREF="https://opengeospatial.github.io/ogc-geosparql/geosparql11/geo.ttl">here</A>.<P>
    </BODY>
</HTML>

Now, the fact the URL is re-directed, in itself, is not a problem. The OWL API is perfectly able to follow redirections (otherwise all our purl.obolibrary.org-based system would never have worked!). But what I suspect is problematic here is this:

Expires: Thu, 01 Jan 1970 00:00:00 GMT

First, this hints at a very badly configured server. Second, I wonder if maybe, upon seeing this, the OWL API (or whatever Java API is used behind the scene to do the actual download) decides not to follow the redirection (since it is expired), and therefore asks its parsers to try to parse the HTML error document…

gouttegd commented 1 year ago

This is also weird:

X-Purl: 2.0; http://localhost:8080
xyuhuang commented 1 year ago

@gouttegd thanks a lot! it works now

jamesaoverton commented 1 year ago

Thanks for digging into this @gouttegd. I think you're right, and some part of the Java HTTP stack is not happy with this HTTP redirect. That's not something we could fix in ROBOT.