neo4j-contrib / neo4j_doc_manager

Doc manager for Neo4j
Apache License 2.0
99 stars 26 forks source link

bulk upsert fails if oplog.timestamp is deleted #62

Open CIB opened 8 years ago

CIB commented 8 years ago

In the official documentation, it says that deleting the timestamp should be possible. But if I start neo4j doc manager once, stop it, delete the timestamp, and start it again, I get the following error:

mongo-connector -m $MONGODB -t $NEO4JDB -d $NEO4JDOCMANAGER

 2016-07-08 07:30:53,976 [CRITICAL] mongo_connector.oplog_manager:625 - Exception during collection dump
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/mongo_connector/util.py", line 32, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/mongo_connector/doc_managers/neo4j_doc_manager.py", line 89, in bulk_upsert
    tx.commit()
  File "/usr/local/lib/python3.4/site-packages/py2neo/cypher/core.py", line 333, in commit
    return self.post(self.__commit or self.__begin_commit)
  File "/usr/local/lib/python3.4/site-packages/py2neo/cypher/core.py", line 288, in post
    raise self.error_class.hydrate(error)
py2neo.cypher.error.schema.ConstraintViolation: Node 0 already exists with label Test and property "_id"=[577ccc5e39a414a3d7d17171]
johnymontana commented 8 years ago

Thanks for pointing this out @CIB. Neo4j Doc Manager creates a uniqueness constraint on the _id property (the value of the ObjectID for each document), so this error is thrown because the bulk upsert is trying to create nodes that already exist. Currently bulk_upsert uses CREATE Cypher statements, but I suppose we could try changing those to MERGE and SET statements to avoid these constraint violation errors. I will try some performance tests with this to see if it makes sense. In the meantime, you could delete the data in Neo4j before restarting the doc manager to avoid this error.

simonthme commented 6 years ago

Hello, I'm facing the same problem. Any news on this? Thanks.