graphaware / neo4j-to-elasticsearch

GraphAware Framework Module for Integrating Neo4j with Elasticsearch
261 stars 57 forks source link

ERROR Error while creating or updating node java.lang.NullPointerException ? #124

Closed mpiivonen closed 6 years ago

mpiivonen commented 6 years ago

Hi,

I'm fairly new with neo4j to elasticsearch and our current neo4j database is quite complex and contains a lot of relationships between nodes. I've managed to replicate some of the nodes with naive com.graphaware.module.ES.node=hasLabel('Folder') settings and it worked nicely.

Anyway we'd need to map a lot of different nodes and relationships so I've been investigating the json mapper. Due to the nature of our neo4j (running version 3.3.4 and 3.3.5) database and the elasticsearch version 6.x we can't use the default mapping because of the multiple the fact that elasticsearch would have more than one type ( open issue #122 ).

I'm using following modules with the neo4j 3.3.4 graphaware-neo4j-to-elasticsearch-3.3.3.52.8.jar graphaware-server-community-all-3.3.3.52.jar graphaware-uuid-3.3.3.52.17.jar

So my approach was to try to do the needed mapping between nodes and relationships with mapping file without default mapping because of the 122 issue) but now I'm getting following error when starting neo4j

mapping.json is in the same dir with neo4j.conf. I'm running osx but I assume that shouldn't matter since I faced the same issue with ubuntu container.

I'm also getting warning because of using the uuid module and it seems to have some problems with reindexing nodes, this hasn't prevented the naive node mapping to elastic before but the warning is 2018-06-17 16:16:20.357+0000 WARN An exception occurred while executing transaction Another org.neo4j.kernel.impl.core.NodeProxy with UUID d22ba767-5399-49b2-afab-c8a1adbec21f already exists (#17404)! com.graphaware.runtime.module.DeliberateTransactionRollbackException: Another org.neo4j.kernel.impl.core.NodeProxy with UUID d22ba767-5399-49b2-afab-c8a1adbec21f already exists (#17404)!

Only solution for the error above I have found is to reinstall neo4j and the modules but I would like to find such a solution that if we need to restart our neo4j for some reason we wouldn't face this type of problem.

The whole error trace for reindexing the nodes goes as

2018-06-17 16:16:33.183+0000 INFO Loading metadata for module ES 2018-06-17 16:16:33.214+0000 ERROR Could not deserialize metadata for module ID ES 2018-06-17 16:16:33.214+0000 INFO Module ES seems to have corrupted metadata. 2018-06-17 16:16:33.214+0000 INFO Module ES seems to have corrupted metadata, will try to re-initialize... 2018-06-17 16:16:33.214+0000 INFO InitializeUntil set to 3697605600000 and it is 1529252193214. Will re-initialize. 2018-06-17 16:16:33.214+0000 INFO InitializeUntil set to 3697605600000 and it is 1529252193214. Will re-index the entire database... 2018-06-17 16:16:33.215+0000 INFO Creating fresh metadata for module ES. 2018-06-17 16:16:33.215+0000 INFO Module ES has not changed configuration since last run, already initialized. 2018-06-17 16:16:33.229+0000 INFO Module metadata loaded. 2018-06-17 16:16:33.229+0000 INFO Starting transaction-driven modules... 2018-06-17 16:16:33.229+0000 INFO Starting Elasticsearch Writer... 2018-06-17 16:16:33.231+0000 INFO Creating Jest Client... SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 2018-06-17 16:16:33.261+0000 INFO Created Jest Client. 2018-06-17 16:16:33.276+0000 INFO Started Elasticsearch Writer. 2018-06-17 16:16:33.279+0000 INFO Re-indexing nodes... 2018-06-17 16:16:33.424+0000 ERROR Error while creating or updating node java.lang.NullPointerException at com.graphaware.module.es.mapping.json.GraphDocumentMapper.getDocumentRepresentation(GraphDocumentMapper.java:95) at com.graphaware.module.es.mapping.json.GraphDocumentMapper.getDocumentRepresentation(GraphDocumentMapper.java:90) at com.graphaware.module.es.mapping.json.DocumentMappingRepresentation.createOrUpdateNode(DocumentMappingRepresentation.java:76) at com.graphaware.module.es.mapping.JsonFileMapping.createNode(JsonFileMapping.java:72) at com.graphaware.module.es.mapping.Mapping.getActions(Mapping.java:52) at com.graphaware.module.es.ElasticSearchWriter.processOperations(ElasticSearchWriter.java:116) at com.graphaware.module.es.ElasticSearchModule.lambda$reindexNodes$2(ElasticSearchModule.java:181) at com.graphaware.tx.executor.batch.IterableInputBatchTransactionExecutor.lambda$processQueue$1(IterableInputBatchTransactionExecutor.java:116) at com.graphaware.tx.executor.single.SimpleTransactionExecutor.doExecuteInTransaction(SimpleTransactionExecutor.java:69) at com.graphaware.tx.executor.single.SimpleTransactionExecutor.executeInTransaction(SimpleTransactionExecutor.java:58) at com.graphaware.tx.executor.batch.IterableInputBatchTransactionExecutor.processQueue(IterableInputBatchTransactionExecutor.java:104) at com.graphaware.tx.executor.batch.IterableInputBatchTransactionExecutor.doExecute(IterableInputBatchTransactionExecutor.java:77) at com.graphaware.tx.executor.batch.DisposableBatchTransactionExecutor.execute(DisposableBatchTransactionExecutor.java:35) at com.graphaware.module.es.ElasticSearchModule.reindexNodes(ElasticSearchModule.java:186) at com.graphaware.module.es.ElasticSearchModule.lambda$reindex$0(ElasticSearchModule.java:143) at com.graphaware.module.es.ElasticSearchModule.reindex(ElasticSearchModule.java:163) at com.graphaware.module.es.ElasticSearchModule.start(ElasticSearchModule.java:91) at com.graphaware.runtime.manager.ProductionTxDrivenModuleManager.start(ProductionTxDrivenModuleManager.java:49) at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.startModules(BaseTxDrivenModuleManager.java:113) at com.graphaware.runtime.TxDrivenRuntime.startModules(TxDrivenRuntime.java:147) at com.graphaware.runtime.ProductionRuntime.startModules(ProductionRuntime.java:70) at com.graphaware.runtime.BaseGraphAwareRuntime.start(BaseGraphAwareRuntime.java:134) at com.graphaware.runtime.bootstrap.RuntimeKernelExtension.lambda$start$9(RuntimeKernelExtension.java:117) at java.lang.Thread.run(Thread.java:748)

And my current mapping file looks following

{ "node_mappings": [ { "condition": "hasLabel('Revision')", "type": "revisions", "properties": { "name": "getProperty('name')", "project": "getProperty('project')", "type" :"getProperty('type')" } }, { "condition": "hasLabel('Folder')", "type": "folders", "properties": { "name": "getProperty('name')", "project": "getProperty('project')" } }, { "condition": "hasLabel('File')", "type": "folders", "properties": { "project": "getProperty('project')" } } ], "relationship_mappings": [ { "condition": "hasType('CONTENTOF')", "type": "content_of", "properties": { "id": "getProperty('id')" } }, { "condition": "hasType('VERSIONOF')", "type": "version_of", "properties": { "id": "getProperty('id')" } }, { "condition": "hasType('LATEST_VERSION')", "type": "latest_version", "properties": { "id": "getProperty('id')" } } ] }

I think the json might be formatted incorrectly or maybe it should contain default mapping, since in neo4j debug.log I can see a lot of lines containing errors ERROR [c.g.m.e.m.j.GraphDocumentMapper] Invalid condition expression {}

I'm aware that elasticsearch log says low disk space but still with naive approach it creates indexes as expected free: 34.2gb[14.7%], replicas will not be assigned to this node [2018-06-17T18:47:32,074][INFO ][o.e.c.r.a.DiskThresholdMonitor] [C_AuasF] rerouting shards: [one or more nodes has gone under the high or low watermark] [2018-06-17T18:56:59,856][INFO ][o.e.c.m.MetaDataCreateIndexService] [C_AuasF] [neo4j-folder-node] creating index, cause [api], templates [], shards [5]/[1], mappings [] [2018-06-17T18:57:00,631][INFO ][o.e.c.m.MetaDataMappingService] [C_AuasF] [neo4j-folder-node/EfGWLYlHQ4KWjTYvtLLtMg] create_mapping [Folder] [2018-06-17T18:57:00,681][INFO ][o.e.c.m.MetaDataMappingService] [C_AuasF] [neo4j-folder-node/EfGWLYlHQ4KWjTYvtLLtMg] update_mapping [Folder]

I would appreciate a lot what would be the correct approach with complex mappings and relationships to index them to elastic. I did read that in some cases user would need the write correct mapping to elasticsearch as well but would hope that wouldn't be needed in my case.

Thanks.

mpiivonen commented 6 years ago

I can verify that by adding defaults property back it started to index all nodes as the default was set but at the elastic endpoint faced the more than one property issue. Once I set defaults as an empty object, I started to get errors to neo4j.log as 2018-06-17 21:01:34.905+0000 ERROR Unable to build index name 2018-06-17 21:01:34.905+0000 ERROR Error while creating or updating node Unable to build index name java.lang.RuntimeException: Unable to build index name

I continued to debug a bit more and now added unique index values for each condition, now facing issues with KeyProperties as ERROR Error while creating or updating node Error while creating json. Missing keyProperty com.graphaware.module.es.mapping.json.DocumentRepresentationException: Error while creating json. Missing keyProperty so trying to add those for each conditions.

Tried to add "keyProperty": "uuid" after type for each condition which gave an error saying unrecognized property or so and after that tried to move keyProperty within the properties object which again gave the previous error ERROR Error while creating or updating node Error while creating json. Missing keyProperty

I did exclude some nodes to make things easier at first. I checked that my Files and Folders both have uuid's which are set the keyProperty and for relationships I set keyProperty to "id" but still having missing key property issue with mappings.json as

{ "defaults": {}, "node_mappings": [ { "condition": "hasLabel('Folder')", "index": "folders", "type": "folders", "properties": { "keyProperty": "uuid", "name": "getProperty('name')", "project": "getProperty('project')" } }, { "condition": "hasLabel('File')", "index": "files", "type": "files", "properties": { "keyProperty": "uuid", "project": "getProperty('project')" } } ], "relationship_mappings": [ { "condition": "hasType('CONTENTOF')", "index": "content_of", "type": "content_of", "properties": { "keyProperty": "id", "id": "getProperty('id')" } }, { "condition": "hasType('VERSIONOF')", "index": "version_of", "type": "version_of", "properties": { "keyProperty": "id", "id": "getProperty('id')" } }, { "condition": "hasType('LATEST_VERSION')", "index": "latest_version", "type": "latest_version", "properties": { "keyProperty": "id", "id": "getProperty('id')" } } ] }

mpiivonen commented 6 years ago

I think I managed to solve my issue by changing keyProperty within the default object to key_property and now it seems all the Files and Folders have been indexed to elastic. I'll keep this ticked open since I'll be continuing tomorrow with more complex relationships within neo4j so probably facing some issues related to this

mpiivonen commented 6 years ago

At the end got everything working once used default object as "defaults": { "key_property": "uuid", "relationships_index": "relationship-index" }

and configured other mappings in the way we actually wanted to map them.

Closing the issue but hopefully it helps someone else in the future with similar problems.

ikwattro commented 6 years ago

Thanks @mpiivonen ! Will add some notes to the doc