obophenotype / upheno

The Unified Phenotype Ontology (uPheno) integrates multiple phenotype ontologies into a unified cross-species phenotype ontology.
https://obophenotype.github.io/upheno/
Creative Commons Zero v1.0 Universal
76 stars 17 forks source link

Failure importing http://purl.obolibrary.org/obo/upheno/upheno_root_alignments.owl #929

Open haideriqbal opened 4 months ago

haideriqbal commented 4 months ago

Hi Team,

Upheno isn't being loaded in the OLS at the moment because http://purl.obolibrary.org/obo/upheno/upheno_root_alignments.owl import is failing. Below is the exception which is raised in our pipeline:

org.apache.jena.riot.RiotException: [line: 1, col: 1 ] Content is not allowed in prolog. at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:153) at org.apache.jena.riot.lang.ReaderRIOTRDFXML$ErrorHandlerBridge.fatalError(ReaderRIOTRDFXML.java:313) at org.apache.jena.rdfxml.xmlinput.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:47) at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:199) at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.fatalError(XMLHandler.java:229) at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:181) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1471) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:978) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:541) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1224) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:101) at org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:118) at org.apache.jena.riot.lang.ReaderRIOTRDFXML.parse(ReaderRIOTRDFXML.java:188) at org.apache.jena.riot.lang.ReaderRIOTRDFXML.read(ReaderRIOTRDFXML.java:86) at org.apache.jena.riot.RDFParser.read(RDFParser.java:353) at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:322) at org.apache.jena.riot.RDFParser.parse(RDFParser.java:296) at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:540)

Looking at the file it doesn't look in the correct XML format which is expected by the OLS pipeline.

An earlier issue https://github.com/obophenotype/upheno/issues/919 mentions that this would be fixed in the newer version of upheno so not sure if this has been fixed yet or not.

Please let me know if you need any further information.

matentzn commented 4 months ago

Hmmmm. Not so sure this is a uPheno error per se - it is totally fine that an ontology imports a non-rdfxml import. @jamesamcl what do you think? You could change the serialisation of the file to RDFXML just to satisfy this OLS requirement because we can, but its not strictly speaking "right" :P

jamesamcl commented 4 months ago

it is totally fine that an ontology imports a non-rdfxml import

This in itself is fine, but what I don't think is fine is that the HTTP headers returned for the OWL file have content-type as text/plain, so there is actually no way to determine the encoding. OWLAPI gets around this by bruteforcing all the loaders one by one until one of them doesn't throw an exception which I do not think is really in the spirit of "semantic web" :P.

AFAIK we can't fix this on github either. The problem is that the owl file extension does not indicate anything about the actual encoding of the contents, hence the webserver returning a text/plain content type.

Because the vast majority of OWL files in the wild today are RDF/XML, I think OLS assuming RDF/XML in the absence of any other information is the only sensible default. So yes this is an OLS issue because OLS only loads RDF - but also it's a upheno issue because if upheno provides a serialization of the ontology as a plain file with no metadata, it should probably choose the most common OWL representation rather than a less commonly used one (imo).

matentzn commented 4 months ago

Good argument 😜 ok will you make the change? Or assign Ray.

cmungall commented 4 months ago

OBO does mandate rdf/xml. There are a lot of toolchains that expect at least an rdf serialization eg pronto

On Mon, Feb 5, 2024 at 3:27 AM Nico Matentzoglu @.***> wrote:

Hmmmm. Not so sure this is a uPheno error per se - it is totally fine that an ontology imports a non-rdfxml import. @jamesamcl https://github.com/jamesamcl what do you think? You could change the serialisation of the file to RDFXML just to satisfy this OLS requirement because we can, but its not strictly speaking "right" :P

— Reply to this email directly, view it on GitHub https://github.com/obophenotype/upheno/issues/929#issuecomment-1926765458, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIG4JFN5Y63TG4DJRLYSC6YXAVCNFSM6AAAAABCZ6PS6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRWG43DKNBVHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>