graphaware / neo4j-to-elasticsearch

GraphAware Framework Module for Integrating Neo4j with Elasticsearch
261 stars 57 forks source link

Don't sync data from neo4j to elasticsearch #149

Closed nguyenhuuloc304 closed 5 years ago

nguyenhuuloc304 commented 5 years ago

Hi,

I did an integration Neo4j and Elasticsearch following the instruction. But it did not work:

I had Neo4j Interprise 3.5.0 ElasticSearch 6.5.4

In Neo4j Plugin: graphaware-neo4j-to-elasticsearch-3.5.0.53.11.jar graphaware-server-enterprise-all-3.5.0.53.jar graphaware-uuid-3.5.0.53.17.jar

and this is my Neo4j additional config

# This setting should only be set once for registering the framework and all the used submodules
dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware

com.graphaware.runtime.enabled=true

#UIDM becomes the module ID:
com.graphaware.module.UIDM.1=com.graphaware.module.uuid.UuidBootstrapper

#optional, default is "uuid". (only if using the UUID module)
com.graphaware.module.UIDM.uuidProperty=uuid

#optional, default is all nodes:
com.graphaware.module.UIDM.node=hasLabel('Label1') || hasLabel('Label2')

#optional, default is uuidIndex
com.graphaware.module.UIDM.uuidIndex=uuidIndex

#prevent the whole db to be assigned a new uuid if the uuid module is settle up together with neo4j2es
com.graphaware.module.UIDM.initializeUntil=0

#ES becomes the module ID:
com.graphaware.module.ES.2=com.graphaware.module.es.ElasticSearchModuleBootstrapper

#URI of Elasticsearch
com.graphaware.module.ES.uri=localhost

#Port of Elasticsearch
com.graphaware.module.ES.port=9200

#optional, protocol of Elasticsearch connection, defaults to http
com.graphaware.module.ES.protocol=http

#optional, Elasticsearch index name, default is neo4j-index
com.graphaware.module.ES.index=neo4j-index

#optional, node property key of a propery that is used as unique identifier of the node. Must be the same as com.graphaware.module.UIDM.uuidProperty (only if using UUID module), defaults to uuid
#use "ID()" to use native Neo4j IDs as Elasticsearch IDs (not recommended)
com.graphaware.module.ES.keyProperty=uuid

#optional, whether to retry if a replication fails, defaults to false
com.graphaware.module.ES.retryOnError=false

#optional, size of the in-memory queue that queues up operations to be synchronised to Elasticsearch, defaults to 10000
com.graphaware.module.ES.queueSize=10000

#optional, size of the batch size to use during re-initialization, defaults to 1000
com.graphaware.module.ES.reindexBatchSize=2000

#optional, specify which nodes to index in Elasticsearch, defaults to all nodes
com.graphaware.module.ES.node=hasLabel('Person')

#optional, specify which node properties to index in Elasticsearch, defaults to all properties
com.graphaware.module.ES.node.property=key != 'born'

#optional, specify whether to send updates to Elasticsearch in bulk, defaults to true (highly recommended)
com.graphaware.module.ES.bulk=true

#optional, read explanation below, defaults to 0
com.graphaware.module.ES.initializeUntil=0

When I restarted Neo4j, I can see it created 2 index: neo4j-index-node, neo4j-index-relationship. But without document

health status index                    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   neo4j-index-node         FybpGU-LRASt2jiWMt9Hnw   5   1          0            0      1.1kb          1.1kb
yellow open   neo4j-index-relationship lV39OeyrR_CKpogajZON3w   5   1          0            0      1.1kb          1.1kb

When I create new node in Neo4j and even restart Neo4j again. There still was no document in ElasticSearch.

I just begin with Elastichsearch. So what did I miss any step? Everyone can help me? how to make it work?

ikwattro commented 5 years ago

Hi,

Couple of things

com.graphaware.module.UIDM.node=hasLabel('Label1') || hasLabel('Label2')

That line specifies that only the nodes with Label1 or Label2 labels will get an uuid assigned, uuid needed for replication, I suppose you do not have nodes with such labels

com.graphaware.module.ES.node=hasLabel('Person')

Same here, do you have Person nodes ?

nguyenhuuloc304 commented 5 years ago

Hi @ikwattro,

Yes, I have Person nodes. I did with existing data. I tried to set com.graphaware.module.ES.initializeUntil=2000000000000. Then it worked. But i'm not sure that i can fully understand meaning of this value. Anyway, thank you for your support and I will close this issue. I think the root cause does not belong to neo4j-to-elasticsearch.

ikwattro commented 5 years ago

@nguyenhuuloc304 This is the meaning :

The Elasticsearch Integration configuration is described in the inline comments above. The only property that needs a little more explanation is com.graphaware.module.ES.initializeUntil:

Every GraphAware Framework Module has methods (initialize() and reinitialize()) that provide a mechanism to get the world into a state equivalent to a situation in which the module has been running since the database was empty. These methods kick in in one of the following scenarios:

The database is not empty when the module has been registered for the first time (GraphAware Framework used on an existing database)
The configuration of the module has changed since the last time it was run
Some failure occurred that causes the Framework to think it should fix things.
We've decided that we should not shoot the whole database at Elasticsearch in one of these scenarios automatically, because it could well be quite large. Therefore, in order to trigger (re-)indexing, i.e. sending every node that should be indexed to Elasticsearch upon Neo4j restart, you have to manually intervene.

The way you intervene is set the com.graphaware.module.ES.initializeUntil to a number slightly higher than a Java call to System.currentTimeInMillis() would return when the module is starting. This way, the database will be (re-)indexed once, not with every following restart. In other words, re-indexing will happen iff System.currentTimeInMillis() < com.graphaware.module.ES.initializeUntil. If you're not sure what all of this means or don't know how to find the right number to set this value to, you're probably best off leaving it alone or getting in touch for some (paid) support.