graphaware / neo4j-to-elasticsearch

GraphAware Framework Module for Integrating Neo4j with Elasticsearch
261 stars 57 forks source link

Issue to replicate Neo4j to ES (queue full) #161

Closed nsanglar closed 3 years ago

nsanglar commented 4 years ago

Hello!

We are trying to perform the initial indexation to ES, and our neo4j instance is about 60 Go. Our Neo4j instance sizing:

Our ES instance sizing:

The plugin config (the strange syntax comes from the fact that these are k8s env variables):

NEO4J_com_graphaware_module_UIDM_1=com.graphaware.module.uuid.UuidBootstrappe
NEO4J_com_graphaware_module_UIDM_relationship=com.graphaware.runtime.policy.all.IncludeAllBusinessRelationships
NEO4J_com_graphaware_runtime_enabled=true
NEO4J_com_graphaware_module_ES_2=com.graphaware.module.es.ElasticSearchModuleBootstrapper
NEO4J_com_graphaware_module_ES_port=9200
NEO4J_com_graphaware_module_ES_asyncIndexation=true
NEO4J_com_graphaware_module_ES_bulk=true

It starts fine, but eventually the work queue is getting full (more than 10000) and ultimately we are getting messages such as:

Could not write task ES-1475185448636 to queue as it is too full. We're losing tasks now.

In the ES side, metrics do not show any issue performance wise, but it seems that Neo4j is really slow (cannot perform queries anymore).

My guess is that on neo4j, too many operations need to be pushed to Elasticsearch, and that the plugin cannot keep up. A 60 Go database is not really big, so I guess that I something is misconfigured somehow.

Do you have any suggestion about this?

nsanglar commented 4 years ago

After going through the issues, I found https://github.com/graphaware/neo4j-to-elasticsearch/issues/75 For us going through Kafka would make sense as we are using it already. @ikwattro is there still a plan to open source the functionality you are referring in the linked issue?