owlcs / owlapi

OWL API main repository
823 stars 315 forks source link

OWLAPI failing in Hadoop environment #697

Closed nickrobison closed 7 years ago

nickrobison commented 7 years ago

I've run into an issue running some of our code in a Hadoop environment. While everything runs nicely in our current environments, when we attempt to execute with Hadoop we run into the following error:

org.semanticweb.owlapi.io.UnparsableOntologyException: Problem parsing inputstream:ontology3678919686534481
Could not parse ontology.  Either a suitable parser could not be found, or parsing failed.  See parser logs below for explanation.
The following parsers were tried:
1) org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser@3523a08a

Detailed logs:
--------------------------------------------------------------------------------
Parser: org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser@3523a08a
    Stack trace:
LINENO: 1 - Could not find tag separator ':' in line.
LINE: <?xml version="1.0"?>        org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser.parse(OBOFormatOWLAPIParser.java:50)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:188)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.load(OWLOntologyManagerImpl.java:1069)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1031)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:987)
        com.nickrobison.trestle.ontology.OntologyBuilder.build(OntologyBuilder.java:130)
        com.nickrobison.trestle.ontology.TrestleOntologyModule.<init>(TrestleOntologyModule.java:23)
        com.nickrobison.trestle.reasoner.TrestleReasonerImpl.<init>(TrestleReasonerImpl.java:182)
        com.nickrobison.trestle.reasoner.TrestleBuilder.build(TrestleBuilder.java:186)
        com.nickrobison.gaulintegrator.IntegrationRunner.run(IntegrationRunner.java:57)
LINENO: 1 - Could not find tag separator ':' in line.
LINE: <?xml version="1.0"?>        org.obolibrary.oboformat.parser.OBOFormatParser.error(OBOFormatParser.java:1310)
        org.obolibrary.oboformat.parser.OBOFormatParser.getParseTag(OBOFormatParser.java:733)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseHeaderClause(OBOFormatParser.java:382)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseHeaderClauseNl(OBOFormatParser.java:375)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseHeaderFrame(OBOFormatParser.java:358)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseOBODoc(OBOFormatParser.java:239)
        org.obolibrary.oboformat.parser.OBOFormatParser.parse(OBOFormatParser.java:211)
        org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser.parse(OBOFormatOWLAPIParser.java:44)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:188)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.load(OWLOntologyManagerImpl.java:1069)

    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:229)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.load(OWLOntologyManagerImpl.java:1069)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1031)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:987)
    at com.nickrobison.trestle.ontology.OntologyBuilder.build(OntologyBuilder.java:130)
    at com.nickrobison.trestle.ontology.TrestleOntologyModule.<init>(TrestleOntologyModule.java:23)
    at com.nickrobison.trestle.reasoner.TrestleReasonerImpl.<init>(TrestleReasonerImpl.java:182)
    at com.nickrobison.trestle.reasoner.TrestleBuilder.build(TrestleBuilder.java:186)
    at com.nickrobison.gaulintegrator.IntegrationRunner.run(IntegrationRunner.java:57)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at com.nickrobison.gaulintegrator.IntegrationRunner.main(IntegrationRunner.java:33)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

The ontology we're attempting to load is an OWLXML file, located on a shared server accessible to the worker machines. It seems like the OWLAPI is only trying the OBO parser and none of the other parsers in the classpath.

As we distribute the application as an uber jar, I can verify that the required classes (owlapi-parsers, owlapi-rio packages) are indeed loaded on the classpath.

Any insights would be greatly appreciated.

ansell commented 7 years ago

What do the META-INF/services files inside of the uber jar look like?

nickrobison commented 7 years ago
META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory
META-INF/services/org.apache.hadoop.fs.FileSystem
META-INF/services/org.apache.hadoop.io.compress.CompressionCodec
META-INF/services/org.apache.hadoop.security.alias.CredentialProviderFactory
META-INF/services/org.apache.hadoop.security.SecurityInfo
META-INF/services/org.apache.hadoop.security.token.TokenIdentifier
META-INF/services/org.apache.hadoop.security.token.TokenRenewer
META-INF/services/com.fasterxml.jackson.core.JsonFactory
META-INF/services/com.fasterxml.jackson.core.ObjectCodec
META-INF/services/org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider
META-INF/services/java.sql.Driver
META-INF/services/org.geotools.data.FileDataStoreFactorySpi
META-INF/services/org.geotools.data.DataStoreFactorySpi
META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory
META-INF/services/org.geotools.data.FeatureLockFactory
META-INF/services/org.geotools.xml.schema.Schema
META-INF/services/org.geotools.feature.FeatureCollections
META-INF/services/org.geotools.data.DataSourceFactorySpi
META-INF/services/org.geotools.filter.FunctionFactory
META-INF/services/org.opengis.feature.FeatureFactory
META-INF/services/org.opengis.feature.type.FeatureTypeFactory
META-INF/services/org.geotools.feature.AttributeTypeFactory
META-INF/services/org.geotools.styling.StyleFactory
META-INF/services/org.opengis.filter.FilterFactory
META-INF/services/com.vividsolutions.xdo.SchemaBuilder
META-INF/services/org.geotools.util.ConverterFactory
META-INF/services/org.opengis.filter.expression.Function
META-INF/services/org.geotools.filter.expression.PropertyAccessorFactory
META-INF/services/org.opengis.referencing.datum.DatumAuthorityFactory
META-INF/services/org.opengis.referencing.crs.CRSAuthorityFactory
META-INF/services/org.opengis.referencing.operation.CoordinateOperationAuthorityFactory
META-INF/services/org.opengis.referencing.cs.CSAuthorityFactory
META-INF/services/org.opengis.referencing.operation.CoordinateOperationFactory
META-INF/services/org.opengis.referencing.operation.MathTransformFactory
META-INF/services/org.opengis.referencing.cs.CSFactory
META-INF/services/org.opengis.referencing.datum.DatumFactory
META-INF/services/org.geotools.referencing.factory.gridshift.GridShiftLocator
META-INF/services/org.geotools.referencing.operation.MathTransformProvider
META-INF/services/org.opengis.referencing.crs.CRSFactory
META-INF/services/org.semanticweb.owlapi.model.OWLOntologyManagerFactory
META-INF/services/org.semanticweb.owlapi.io.OWLParserFactory
META-INF/services/org.semanticweb.owlapi.model.OWLStorerFactory
META-INF/services/org.eclipse.rdf4j.rio.RDFParserFactory
META-INF/services/org.openrdf.rio.RDFParserFactory
META-INF/services/org.semanticweb.owlapi.model.OWLDocumentFormatFactory
META-INF/services/javax.cache.spi.CachingProvider
META-INF/services/com.ontotext.trree.mbeans.MBeanFactory
META-INF/services/com.ontotext.trree.sdk.Plugin
META-INF/services/org.eclipse.rdf4j.query.algebra.evaluation.function.Function
META-INF/services/org.eclipse.rdf4j.repository.config.RepositoryFactory
META-INF/services/org.eclipse.rdf4j.sail.config.SailFactory
META-INF/services/org.apache.jena.system.JenaSubsystemLifecycle
META-INF/services/javax.xml.datatype.DatatypeFactory
META-INF/services/javax.xml.parsers.DocumentBuilderFactory
META-INF/services/javax.xml.parsers.SAXParserFactory
META-INF/services/javax.xml.stream.XMLEventFactory
META-INF/services/javax.xml.validation.SchemaFactory
META-INF/services/org.w3c.dom.DOMImplementationSourceList
META-INF/services/org.xml.sax.driver
META-INF/services/org.apache.commons.logging.LogFactory
META-INF/services/org.apache.lucene.codecs.Codec
META-INF/services/org.apache.lucene.codecs.DocValuesFormat
META-INF/services/org.apache.lucene.codecs.PostingsFormat
META-INF/services/org.apache.lucene.analysis.util.CharFilterFactory
META-INF/services/org.apache.lucene.analysis.util.TokenFilterFactory
META-INF/services/org.apache.lucene.analysis.util.TokenizerFactory
META-INF/services/javax.xml.stream.XMLInputFactory
META-INF/services/javax.xml.stream.XMLOutputFactory
META-INF/services/org.codehaus.stax2.validation.XMLValidationSchemaFactory.dtd
META-INF/services/org.codehaus.stax2.validation.XMLValidationSchemaFactory.relaxng
META-INF/services/org.eclipse.rdf4j.query.parser.QueryParserFactory
META-INF/services/org.eclipse.rdf4j.query.resultio.TupleQueryResultParserFactory
META-INF/services/org.eclipse.rdf4j.query.resultio.TupleQueryResultWriterFactory
META-INF/services/org.eclipse.rdf4j.query.resultio.BooleanQueryResultParserFactory
META-INF/services/org.eclipse.rdf4j.query.resultio.BooleanQueryResultWriterFactory
META-INF/services/java.nio.file.spi.FileTypeDetector
META-INF/services/org.eclipse.rdf4j.rio.DatatypeHandler
META-INF/services/org.eclipse.rdf4j.rio.LanguageHandler
META-INF/services/org.eclipse.rdf4j.rio.LanguageHandler
META-INF/services/org.eclipse.rdf4j.rio.RDFWriterFactory
META-INF/services/org.eclipse.rdf4j.query.algebra.evaluation.function.TupleFunction
META-INF/services/com.sun.jersey.spi.HeaderDelegateProvider
META-INF/services/com.sun.jersey.spi.inject.InjectableProvider
META-INF/services/javax.ws.rs.ext.MessageBodyReader
META-INF/services/javax.ws.rs.ext.MessageBodyWriter

I should probably also clarify that I'm using OWLAPI 5.1.1.

nickrobison commented 7 years ago

Your comment pointed me in the right direction, the OWLParserFactory in the services/ directory was only listing the OBO parser. Turns out I had a missing resource transformer, which was causing the other entries to be removed. Adding that solved the issue!

ansell commented 7 years ago

They are probably okay, but double-check that there is more than one line in the following:

META-INF/services/org.semanticweb.owlapi.io.OWLParserFactory
META-INF/services/org.eclipse.rdf4j.rio.RDFParserFactory
META-INF/services/org.semanticweb.owlapi.model.OWLDocumentFormatFactory

If they only contain one line or are less than expected, then you may need to adjust the merge method so that it concatenates the META-INF/services files together rather than selecting the first/last one and discarding the others.

https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

https://maven.apache.org/plugins/maven-assembly-plugin/examples/single/using-container-descriptor-handlers.html

Also interesting that there is the following file, but may be necessary for the other libraries so probably not an issue overall:

META-INF/services/org.openrdf.rio.RDFParserFactory
ansell commented 7 years ago

Sorry, comments overlapped!

nickrobison commented 7 years ago

No problem, thanks for the quick response!

ignazio1977 commented 7 years ago

Faster than me thanks to time zones :-) similar issues have been seen in the past due to different aggregation techniques used to create uber jars by other developers, the problem is as you've found - if the aggregation transforms do not create the right files in the meta-inf/services folder, only a subset of parsers is created.

There is a unit test that checks that the number of parsers is as expected, maybe it could be moulded into an integration test for your uber parser generation process. See https://github.com/owlcs/owlapi/blob/version5/contract/src/test/java/org/semanticweb/owlapi/rio/OWLOntologyStorerFactoryRegistryTestCase.java