Open aleph2 opened 9 years ago
@aleph2 does this still happen? im having weird issues where documents seem to be disappearing.
I fixed it with some code modification, seems that only happened when mongodb configured as Shard
On 23 July 2015 at 07:16, harshjari notifications@github.com wrote:
@aleph2 https://github.com/aleph2 does this still happen? im having weird issues where documents seem to be disappearing.
— Reply to this email directly or view it on GitHub https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/526#issuecomment-123901955 .
Noticed this issue on our production servers after a client mentioned a missing document after updating. We traced the issue back to the indexer performing the delete and update concurrently as stated above. Currently running v2.0.0 of the river on ES 1.0.0.
@aleph2 Did you create a temporary patch for the river in order to fix this? If so, do you mind sharing until this is fixed? Thanks!
edit: I'm assuming I can simply flush the BulkProcessor after the delete in order to resolve this?
We simply set the concurrent_bulk_requests: 1
on the river since the river is doing both deletes and inserts and we want to ensure that the operation log is replayed in the order that operations occurred. I would assume this would be by default as this also has the potential of inserting the incorrect document. The following code made it easy to see the issue consistently, although it's a race and doesn't always occur.
var c1, c2;
for (var i = 0; i < 100; i++) {
c1 = db.contacts.findOne(ObjectId('...'));
c2 = db.contacts.findOne(ObjectId('...'));
c1.firstName = 'Test1-' + i;
c2.firstName = 'Test2-' + i;
db.contacts.save(c1);
db.contacts.save(c2);
}
If you run this at times you'll see a contact missing while others the end result isn't the expected Test1-99
and instead having a name of Test1-89 or you'll see a contact missing and the other contact with the wrong firstName
.
River will use "Delete" and "Insert" operation for mongodb "Update" operation. However, the "Delete"/"Insert" operation will be executed concurrently, sometimes, the "Insert" operation will be executed before the "Delete" operation. Thus, there is a data lost in river.