Occasionally, the index that the river was streaming data to would loose all it's documents - going from ~700k to 0 documents.
I wipe the index, the river and recreate it - the initial sync works as expected - once the number of documents in MongoDB and ElasticSearch become equal the index loses all the documents again.
After losing all the documents after an initial sync, the river continues to stream new documents to ES, going up to 100, a few thousand sometimes, and then after a period of time, will reset to 0, repeating this process.
MongoDB setup:
3 shard cluster, each shard a replica set of 3 - each node on it's own VM
Also a note, I have two duplicate deployments for staging and production - with one difference, the staging environment has enabled sharding on the profiles collection that is being streamed by the river. It is the sharded staging environment that is experiencing this issue.
I did some digging into this, and it seems to be caused by map reduce jobs (I was able to easily reproduce this by checking the number of documents in ES, hitting our front end view that triggers a map reduce, and then checking the number of documents in ES again - it always dropped back down to 0). The river doesn't seem to be able to differentiate between a drop on the collection being monitored and a drop on other collections.
As a temporary solution, I have set the drop_collection option to false - this has resolved the issue.
You can see in the following ES logs a map reduce query being on the unrelated reviews collection. A temporary collection is created and then renamed, triggering a DROP_COLLECTION operation on the profiles collection.
I have been an experiencing an issue:
MongoDB setup:
ElasticSearch:
River config:
Also a note, I have two duplicate deployments for staging and production - with one difference, the staging environment has enabled sharding on the profiles collection that is being streamed by the river. It is the sharded staging environment that is experiencing this issue.
I did some digging into this, and it seems to be caused by map reduce jobs (I was able to easily reproduce this by checking the number of documents in ES, hitting our front end view that triggers a map reduce, and then checking the number of documents in ES again - it always dropped back down to 0). The river doesn't seem to be able to differentiate between a drop on the collection being monitored and a drop on other collections.
As a temporary solution, I have set the
drop_collection
option to false - this has resolved the issue.You can see in the following ES logs a map reduce query being on the unrelated reviews collection. A temporary collection is created and then renamed, triggering a DROP_COLLECTION operation on the profiles collection.