graphaware / neo4j-to-elasticsearch

GraphAware Framework Module for Integrating Neo4j with Elasticsearch
261 stars 57 forks source link

ga.es.queryNodeRaw fails intermittently with RunTimeException #132

Closed manaschoksi closed 6 years ago

manaschoksi commented 6 years ago

Using ElasticSearch: 2.4.3 Neo4j: 3.2.3 GraphAware Plugin: 3.2.1.51.14 Neo4j is a 3 Node HA Cluster ElasticSearch is a 3 node cluster with master and 2 data nodes

We are seeing intermittent failures while running the exact same query with the same parameters. 1 out 5 times the query will fail.

Our system is constantly seeing new data, so there is constant sync going on between Neo4j and ElasticSearch. The search queries from Neo4j can happen when Neo4j and ES are syncing. The queries are issued directly from Neo4j Web Console.

One more insight is that we are not seeing such issues when running Neo4j queries that contain ga.es.queryNode. (The node label queryNode goes to is different than queryNodeRaw).

With respect to logs, there is none. Nothing on ElasticSearch side, nor on Neo4j. If there is any way to get more debugs, we can definitely improvise.

The exception we get is this

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure ga.es.queryNodeRaw: Caused by: java.lang.RuntimeException: Error while performing query on ElasticSearch

Other than that we do see, constant Bulk Operation Succeeded log in Neo4j.

So we enabled, DEBUG logs on ElasticSearch. Here is the traceback

[2018-07-20 17:35:47,681][DEBUG][shield.transport.netty ] [prod-es-master-01-Mona-01] Caught exception while handling client http traffic, closing connection [id: 0x8858e005, /<neo4j-ip>:56854 :> /<elasticsearc-ip>:9200] java.nio.channels.ClosedChannelException at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779) at org.jboss.netty.channel.Channels.write(Channels.java:725) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:784) at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.handleDownstream(HttpPipeliningHandler.java:87) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(NettyHttpChannel.java:147) at org.elasticsearch.http.HttpServer$ResourceHandlingHttpChannel.sendResponse(HttpServer.java:352) at org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43) at org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49) at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85) at org.elasticsearch.action.support.TransportAction$ResponseFilterChain.proceed(TransportAction.java:212) at org.elasticsearch.shield.action.ShieldActionFilter.apply(ShieldActionFilter.java:150) at org.elasticsearch.action.support.TransportAction$ResponseFilterChain.proceed(TransportAction.java:210) at org.elasticsearch.action.support.TransportAction$FilteredActionListener.onResponse(TransportAction.java:237) at org.elasticsearch.action.support.TransportAction$FilteredActionListener.onResponse(TransportAction.java:223) at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction$2.doRun(SearchQueryThenFetchAsyncAction.java:138) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

manaschoksi commented 6 years ago

So more debugging insights. We have isolated the problem with Jest client. The queries that take longer than 3 seconds are being timed out by Jest client within GraphAware.

Any way to configure Jest Client through neo4j config?

ikwattro commented 6 years ago

Thanks for the report @manaschoksi . While fundamentally I advise to query Elasticsearch directly, this is definitely a setting that can be shipped in one of the next versions.

itgold commented 6 years ago

Hello ikwattro, can I do a pull request and push a fix to move timeouts to configuration?

ikwattro commented 6 years ago

@itgold It would be a pleasure :)

Make sure to follow the flow in https://github.com/graphaware/neo4j-to-elasticsearch/blob/master/src/main/java/com/graphaware/module/es/ElasticSearchConfiguration.java and https://github.com/graphaware/neo4j-to-elasticsearch/blob/master/src/main/java/com/graphaware/module/es/ElasticSearchModuleBootstrapper.java and once you open the PR you will be asked to electronically sign a document.

Thanks !

itgold commented 6 years ago

Thank you, @ikwattro I'm on it!

itgold commented 6 years ago

done. Thank you!

manaschoksi commented 6 years ago

Thank you @itgold @ikwattro !! That was fast.

ikwattro commented 6 years ago

@manaschoksi @itgold Released a new version with your contribution inside, can you please validate it fixes your issue ? https://products.graphaware.com/download/neo4j-to-elasticsearch/graphaware-neo4j-to-elasticsearch-3.4.0.52.10.jar

manaschoksi commented 6 years ago

@ikwattro Thanks for the release. Our Neo4j is not yet on 3.4, will validate once we are there. For now, we have upgraded to version 3.2.6.51.7, which has hardcoded Jest timeout from 3s to 10s.

If we feel confident with the fix, lets mark this issue as Resolved.

ikwattro commented 6 years ago

@manaschoksi Great ! I recommend to upgrade to at least 3.3.3.52.8 ( well best to latest actually ) as this version brings a major performance improvement for the replication part ( 13x faster ) -> commit reference : https://github.com/graphaware/neo4j-to-elasticsearch/commit/f3b81cc7ae1f2edd61a8866cb85e1af4961726ed

Recent versions bring as well the ability to replicate with cypher query logic.

Cheers

ikwattro commented 6 years ago

Hi @manaschoksi

Did you have a chance to upgrade to 3.4 ?

manaschoksi commented 6 years ago

Hello @ikwattro

Sorry for the delayed response, but we are finally moving to Neo4j 3.4. So just tested this and seems like the timeout config gets overwritten when mapping config is set in Bootstrap.

I have a raised the following code review https://github.com/graphaware/neo4j-to-elasticsearch/pull/140

Would be great if you could review and help us release this fix. Also, I just signed the CLA as well.

manaschoksi commented 6 years ago

With help of @ikwattro, we were able to get this tested and resolved.