River stops syncing with data, and the status keeps showing as Running

syllogismos commented 8 years ago

[2015-09-18 05:07:47,928][ERROR][org.elasticsearch.river.mongodb.OplogSlurper] Exception while looping in cursor
com.mongodb.MongoException: version changed during initial query ( ns : DbName.Collection, received : 460|0||5501e23a01907f6d2913ea91, wanted : 461|0||5501e23a01907f6d2913ea91, send ) ( ns : DbName.Collection, received : 460|0||5501e23a01907f6d2913ea91, wanted : 461|0||5501e23a01907f6d2913ea91, send )
        at com.mongodb.QueryResultIterator.throwOnQueryFailure(QueryResultIterator.java:246)
        at com.mongodb.QueryResultIterator.init(QueryResultIterator.java:224)
        at com.mongodb.QueryResultIterator.initFromQueryResponse(QueryResultIterator.java:184)
        at com.mongodb.QueryResultIterator.<init>(QueryResultIterator.java:62)
        at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:86)
        at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66)
        at com.mongodb.DBCursor._check(DBCursor.java:498)
        at com.mongodb.DBCursor._hasNext(DBCursor.java:621)
        at com.mongodb.DBCursor.hasNext(DBCursor.java:657)
        at org.elasticsearch.river.mongodb.OplogSlurper.addQueryToStream(OplogSlurper.java:515)
        at org.elasticsearch.river.mongodb.OplogSlurper.addQueryToStream(OplogSlurper.java:508)
        at org.elasticsearch.river.mongodb.OplogSlurper.processOplogEntry(OplogSlurper.java:268)
        at org.elasticsearch.river.mongodb.OplogSlurper.run(OplogSlurper.java:109)
        at java.lang.Thread.run(Thread.java:745)

I have a river on a collection that is around 20Million documents. The river does the initial import normally, and keeps running for a while.. say for few days.. and then stops syncing with data anymore. And I can see the above log in the elasticsearch logs. What might be the reason for this exception.

And one more thing is that the river's status keeps showing as Running.

syllogismos commented 8 years ago

I'm assuming the above error is cause the oplog is too small, and the writes on mongo are happening too fast.. and the replication or river is not able to keep up with it. oplogs are a capped collection, when there are too many writes, older oplogs will be replaced by the newer ones.. and if river is still syncing the old logs in oplogs it will stop suddenly..

more details, from mongo connector, wiki.. https://github.com/mongodb-labs/mongo-connector/wiki/Resyncing%20the%20Connector

This is what I think is happening, and the solution is to increase the op log of the mongo instances.

Will update the mongo settings and check if that solved my problem and post it here.

hash-include commented 8 years ago

Hi syllogismos

I am also facing the same issue. Did you find any solution? Did oplog size increasing help? If so, what are original and new oplog values.

Exception: [2016-03-30 15:13:23,259][ERROR][org.elasticsearch.river.mongodb.OplogSlurper] Exception while looping in cursor com.mongodb.MongoException: getMore executor error: CappedPositionLost: CollectionScan died due to position in capped collection being deleted. Last seen record id: RecordId(6267863405340655630) at com.mongodb.QueryResultIterator.throwOnQueryFailure(QueryResultIterator.java:246) at com.mongodb.QueryResultIterator.init(QueryResultIterator.java:224) at com.mongodb.QueryResultIterator.initFromQueryResponse(QueryResultIterator.java:184) at com.mongodb.QueryResultIterator.getMore(QueryResultIterator.java:149) at com.mongodb.QueryResultIterator.hasNext(QueryResultIterator.java:135) at com.mongodb.DBCursor._hasNext(DBCursor.java:626) at com.mongodb.DBCursor.hasNext(DBCursor.java:657) at org.elasticsearch.river.mongodb.OplogSlurper.run(OplogSlurper.java:98) at java.lang.Thread.run(Thread.java:745)

Thanks Sri Harsha

syllogismos commented 8 years ago

Increasing oplog size does help. Originally I think it was 8GB or something.. and we increased it to 30 GB.

But we found that rivers are very unstable.. and very unreliable. Sometimes it even makes the elasticsearch cluster unstable. CPU maxing out and etc periodically. Even the river statuses are not shown properly. So we stopped depending on this particular plugin.

Instead we forked mongo-connector to fit our needs.

https://github.com/mongodb-labs/mongo-connector this is the project and this is our fork https://github.com/akgoel-mo/mongo-connector

richardwilly98 / elasticsearch-river-mongodb

River stops syncing with data, and the status keeps showing as Running #559