rmap-project / rmap

RMap Project is an Alfred P. Sloan Foundation-funded initiative undertaken by the Data Conservancy, Portico, and IEEE. The goal of RMap is to make it possible to capture and preserve the many-to-many complex relationships among the distributed components of a scholarly work.
https://rmap-project.atlassian.net/wiki
Apache License 2.0
7 stars 6 forks source link

Upgrading to RDF4J 2.2.4 produces integration test failures. #221

Open emetsger opened 6 years ago

emetsger commented 6 years ago

Upgrading to RDF4J 2.2.4 (from 2.2.2) produces a number of test failures. At least one of the IT failures has to do with calculating the number of statements present in an RDF4J context.

An example failure:

[INFO] Running info.rmapproject.indexing.kafka.SaveOffsetOnRebalanceIT
[WARNING] [talledLocalContainer] 13:50:30.652 [2-thread-1] WARN  [IndexingConsume] - Unable to index event ORMapObject{id=rmap:rmd18m7msr, typeStatement=(rmap:rmd18m7msr, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://purl.org/ontology/rmap#Event) [rmap:rmd18m7msr], context=rmap:rmd18m7msr}: Timeout after 10 attempts, 75116 ms: failed to index ORMapObject{id=rmap:rmd18m7msr, typeStatement=(rmap:rmd18m7msr, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://purl.org/ontology/rmap#Event) [rmap:rmd18m7msr], context=rmap:rmd18m7msr}: No DiSCO with id rmap:rmd18m7mr7
[INFO] [talledLocalContainer] info.rmapproject.indexing.IndexingTimeoutException: Timeout after 10 attempts, 75116 ms: failed to index ORMapObject{id=rmap:rmd18m7msr, typeStatement=(rmap:rmd18m7msr, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://purl.org/ontology/rmap#Event) [rmap:rmd18m7msr], context=rmap:rmd18m7msr}: No DiSCO with id rmap:rmd18m7mr7
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.DefaultIndexRetryHandler.retry(DefaultIndexRetryHandler.java:233)
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.IndexingConsumer.indexEvent(IndexingConsumer.java:169)
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.IndexingConsumer.lambda$processRecords$15(IndexingConsumer.java:136)
[INFO] [talledLocalContainer]   at java.lang.Iterable.forEach(Iterable.java:75)
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.IndexingConsumer.processRecords(IndexingConsumer.java:119)
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.IndexingConsumer.consume(IndexingConsumer.java:98)
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.SpringAwareConsumerInitializer.lambda$startConsumers$26(SpringAwareConsumerInitializer.java:204)
[INFO] [talledLocalContainer]   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[INFO] [talledLocalContainer]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[INFO] [talledLocalContainer]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[INFO] [talledLocalContainer]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[INFO] [talledLocalContainer]   at java.lang.Thread.run(Thread.java:745)
[INFO] [talledLocalContainer] Caused by: info.rmapproject.core.exception.RMapDiSCONotFoundException: No DiSCO with id rmap:rmd18m7mr7
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapDiSCOMgr.readDiSCO(ORMapDiSCOMgr.java:140)
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapDiSCOMgr.readDiSCO(ORMapDiSCOMgr.java:114)
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapService.readDiSCO(ORMapService.java:353)
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.DefaultIndexRetryHandler.getDisco(DefaultIndexRetryHandler.java:249)
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.DefaultIndexRetryHandler.composeDTO(DefaultIndexRetryHandler.java:240)
[INFO] [talledLocalContainer]   at info.rmapproject.indexing.kafka.DefaultIndexRetryHandler.retry(DefaultIndexRetryHandler.java:192)
[INFO] [talledLocalContainer]   ... 11 common frames omitted

The source of this issue is in ORMapObjectMgr:

    /**
     * Looks up an IRI's type in the database to see if it matches the type IRI provided
     *
     * @param ts the triplestore instance
     * @param id the IRI to be checked
     * @param typeIRI the type IRI to be checked against
     * @return true, if the id parameter has the type specified in typeIRI
     * @throws RMapException the RMap exception
     */
    public boolean isRMapType(Rdf4jTriplestore ts, IRI id, IRI typeIRI) throws RMapException {
        if (ts==null || id==null || typeIRI==null){
            throw new RMapException("Null parameter passed");
        }
        boolean isRmapType = false;
        try {
            if (ts.getConnection().size(id)>0) {
                //resource exists somewhere, lets find out where
                if (ts.getConnection().hasStatement(id, RDF.TYPE, typeIRI, false, id)) {
                    //it is of defined type!
                    return true;
                } 
            } else if (typeIRI.equals(RMAP.DISCO)) {
                //check events to see if it's a deleted DiSCO
                Set<Statement> stmts = ts.getStatements(null, RMAP.DELETEDOBJECT, id);
                for (Statement stmt : stmts) {
                    IRI subject = (IRI) stmt.getSubject();
                    IRI context = (IRI) stmt.getContext();
                    if (subject.equals(context) && this.isRMapType(ts, subject, RMAP.EVENT)) {
                        return true;
                    }                       
                }       
            }
        } catch (Exception e) {
            throw new RMapException ("Exception thrown searching for object " + id.stringValue(), e);
        }       
        return isRmapType;
    }

Specifically here:

                       if (ts.getConnection().size(id)>0) {
                //resource exists somewhere, lets find out where
                if (ts.getConnection().hasStatement(id, RDF.TYPE, typeIRI, false, id)) {
                    //it is of defined type!
                    return true;
                } 
            }

With 2.2.4 ts.getConnection().size(id) returns 0, (where id is considered to be the context (i.e. graph uri)). With 2.2.2, this method behaves as expected, returning some positive integer for the number of statements present in the triplestore for the graph. When ts.getConnection().size(id) returns 0, the method isRMapType(Rdf4jTriplestore ts, IRI id, IRI typeIRI) returns false, ultimately resulting in the info.rmapproject.core.exception.RMapDiSCONotFoundException: No DiSCO with id rmap:rmd18m7mr7 exception.

karenhanson commented 5 years ago

There seems to be more to it than this one problem. I attempted to update to both 2.2.4 and 2.4.2. As part of the updates. For 2.2.4, I circumvented the issue described above by adding a SPARQL query to check for a context. (see my branch here). The tests in that version all pass, but when deployed, the first DiSCO to be posted through the API after deploying a local instance often does not get created properly in the triplestore, though it will sometimes get indexed. While navigating, you will run into a mixture of errors at different points.

Most common:

java.lang.NullPointerException
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapQueriesLineage.getLineageMembersWithDates(ORMapQueriesLineage.java:139) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapQueriesLineage.getLineageMembers(ORMapQueriesLineage.java:158) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapService.getDiSCODVersionsAndDerivatives(ORMapService.java:543) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.webapp.service.DataDisplayServiceImpl.getDiSCODTO(DataDisplayServiceImpl.java:173) ~[classes/:?]
[INFO] [talledLocalContainer]   at info.rmapproject.webapp.controllers.DiSCODataController.discoTableData(DiSCODataController.java:202) ~[classes/:?]
[INFO] [talledLocalContainer]   ... 42 more

Further inspection finds progenitor is null. Here is a rarer one:

[INFO] [talledLocalContainer] Caused by: org.apache.http.impl.conn.ConnectionShutdownException
[INFO] [talledLocalContainer]   at org.apache.http.impl.conn.CPoolProxy.getValidConnection(CPoolProxy.java:79) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.conn.CPoolProxy.getSSLSession(CPoolProxy.java:147) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.client.DefaultUserTokenHandler.getUserToken(DefaultUserTokenHandler.java:81) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:327) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[httpclient-4.5.2.jar:4.5.2]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1113) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1066) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendGraphQueryViaHttp(SPARQLProtocolSession.java:972) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:938) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.http.client.RDF4JProtocolSession.getStatements(RDF4JProtocolSession.java:492) ~[rdf4j-http-client-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.exportStatements(HTTPRepositoryConnection.java:284) ~[rdf4j-repository-http-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.getStatements(HTTPRepositoryConnection.java:269) ~[rdf4j-repository-http-2.2.2.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.triplestore.Rdf4jTriplestore.getStatements(Rdf4jTriplestore.java:231) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.triplestore.Rdf4jTriplestore.getStatements(Rdf4jTriplestore.java:255) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapResourceMgr.getResourceRdfTypes(ORMapResourceMgr.java:443) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.core.rmapservice.impl.rdf4j.ORMapService.getResourceRdfTypesInContext(ORMapService.java:279) ~[rmap-core-2.0.1-beta-SNAPSHOT.jar:?]
[INFO] [talledLocalContainer]   at info.rmapproject.webapp.service.DataDisplayServiceImpl.getDiSCOGraph(DataDisplayServiceImpl.java:202) ~[classes/:?]
[INFO] [talledLocalContainer]   at info.rmapproject.webapp.controllers.DiSCODataController.discoGraphData(DiSCODataController.java:164) ~[classes/:?]
[INFO] [talledLocalContainer]   ... 42 more

Using 2.4.2, it has these problems, plus it hangs without error after a few minutes of browsing the webapp so that you can no longer navigate DiSCOs. Often the last activity it does is attempt to retrieve the DiSCO versions. The way the UI is designed, it does 2 similar requests in parallel which may be causing a problem (sometimes you will see the 2 version requests fire off in tandem before it hangs).

Because of the tandem firing of requests, it suggests it might be thread related, but it's not clear why the new rdf4j would cause or maybe just highlight some existing issues. I note that lineage methods are the only static methods in RMapService, not sure if that could be related as well.

karenhanson commented 5 years ago

PR #255 also has a little more information about this issue.

A few thoughts related to this that I wanted to capture: (1) I think it might be helpful to look into a refactor of the rdf4j connection slice of RMap. The way it was done originally is less than ideal with RMap adding an unnecessary layer to the connection object in places. Since RMap was created, rdf4j has changed quite a lot. If we take advantage of the changes to connection object some of the issues might go away. (2) On the back burner, when I have time between things, I've been working to refactor out unnecessary rdf4j-specific logic in anticipation that we may want to move away from rdf4j as the sole option in the future. For example, there is no reason to use rdf4j in model objects or vocabularies. (3) GraphDB has many more release versions since the one we are running. Perhaps the connection issues that we are seeing in the live environment but not the local environment could be resolved by updating GraphDB