richardwilly98 / elasticsearch-river-mongodb

MongoDB River Plugin for ElasticSearch
1.12k stars 215 forks source link

Map Reduce triggering drop #543

Open ecoutu opened 9 years ago

ecoutu commented 9 years ago

I have been an experiencing an issue:

MongoDB setup:

ElasticSearch:

River config:

      "crawldbRiver": {
        "type": "mongodb",
        "mongodb": {
          "servers": [
            { "host": "xxx", "port": 27017 }
          ],
          "options": {
            "include_fields": ["......."],
            "drop_collection": true
          },
          "credentials": [
            {"db": "admin", "user": "xxx", "password": "xxx"}
          ],
          "db": "xxx",
          "collection": "profiles"
        },
        "index": {
          "name": "profiles",
          "type": "profile"
        }
      },

Also a note, I have two duplicate deployments for staging and production - with one difference, the staging environment has enabled sharding on the profiles collection that is being streamed by the river. It is the sharded staging environment that is experiencing this issue.

I did some digging into this, and it seems to be caused by map reduce jobs (I was able to easily reproduce this by checking the number of documents in ES, hitting our front end view that triggers a map reduce, and then checking the number of documents in ES again - it always dropped back down to 0). The river doesn't seem to be able to differentiate between a drop on the collection being monitored and a drop on other collections.

As a temporary solution, I have set the drop_collection option to false - this has resolved the issue.

You can see in the following ES logs a map reduce query being on the unrelated reviews collection. A temporary collection is created and then renamed, triggering a DROP_COLLECTION operation on the profiles collection.

[2015-07-02 17:49:39,600][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] namespace: crawldb.$cmd - operation: COMMAND
[2015-07-02 17:49:39,600][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] MongoDB object deserialized: { "create" : "tmp.mr.reviews_2147" , "temp" : true}
[2015-07-02 17:49:39,600][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] collection: profiles
[2015-07-02 17:49:39,600][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] oplog entry - namespace [crawldb.$cmd], operation [COMMAND]
[2015-07-02 17:49:39,600][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] oplog processing item { "ts" : { "$ts" : 1435859379 , "$inc" : 1} , "h" : 6055431814338317965 , "v" : 2 , "op" : "c" , "ns" : "crawldb.$cmd" , "o" : { "create" : "tmp.mr.reviews_2147" , "temp" : true}}
[2015-07-02 17:49:39,600][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] addToStream - operation [COMMAND], currentTimestamp [Timestamp.BSON(ts={ "$ts" : 1435859379 , "$inc" : 1})], data [{ }], collection [profiles]
[2015-07-02 17:49:39,601][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] namespace: admin.$cmd - operation: COMMAND
[2015-07-02 17:49:39,601][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] processAdminCommandOplogEntry - [{ "ts" : { "$ts" : 1435859379 , "$inc" : 2} , "h" : -4965257526687656775 , "v" : 2 , "op" : "c" , "ns" : "admin.$cmd" , "o" : { "renameCollection" : "crawldb.tmp.mr.reviews_2147" , "to" : "crawldb.tmp.mrs.reviews_1435859379_2817" , "stayTemp" : true}}]
[2015-07-02 17:49:39,601][TRACE][org.elasticsearch.river.mongodb.Indexer] Operation: COMMAND - index: profiles - type: profile - routing: null - parent: null
[2015-07-02 17:49:39,608][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] namespace: crawldb.$cmd - operation: DROP_COLLECTION
[2015-07-02 17:49:39,608][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] MongoDB object deserialized: { "drop" : "tmp.mrs.reviews_1435859379_2817"}
[2015-07-02 17:49:39,608][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] collection: profiles
[2015-07-02 17:49:39,608][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] oplog entry - namespace [crawldb.$cmd], operation [DROP_COLLECTION]
[2015-07-02 17:49:39,608][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] oplog processing item { "ts" : { "$ts" : 1435859379 , "$inc" : 3} , "h" : 4575942115879798158 , "v" : 2 , "op" : "c" , "ns" : "crawldb.$cmd" , "o" : { "drop" : "tmp.mrs.reviews_1435859379_2817"}}
[2015-07-02 17:49:39,608][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] addToStream - operation [DROP_COLLECTION], currentTimestamp [Timestamp.BSON(ts={ "$ts" : 1435859379 , "$inc" : 3})], data [{ }], collection [profiles]
[2015-07-02 17:49:39,608][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] namespace: crawldb.profiles - operation: UPDATE
[2015-07-02 17:49:39,609][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] MongoDB object deserialized is 675 characters long
[2015-07-02 17:49:39,609][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] collection: profiles
[2015-07-02 17:49:39,609][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] oplog entry - namespace [crawldb.profiles], operation [UPDATE]
[2015-07-02 17:49:39,609][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] Updated item: { "_id" : { "$oid" : "5567676308731842ca887513"}}
[2015-07-02 17:49:39,609][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] addQueryToStream - operation [UPDATE], currentTimestamp [Timestamp.BSON(ts={ "$ts" : 1435859379 , "$inc" : 4})], update [{ "_id" : { "$oid" : "5567676308731842ca887513"}}]
[2015-07-02 17:49:39,609][TRACE][org.elasticsearch.river.mongodb.Indexer] updateBulkRequest for id: [], operation: [DROP_COLLECTION]
[2015-07-02 17:49:39,609][TRACE][org.elasticsearch.river.mongodb.Indexer] Operation: DROP_COLLECTION - index: profiles - type: profile - routing: null - parent: null
[2015-07-02 17:49:39,616][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] addToStream - operation [UPDATE], currentTimestamp [Timestamp.BSON(ts={ "$ts" : 1435859379 , "$inc" : 4})], data [{ "_id" : { "$oid" : "5567676308731842ca887513"} ,xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}], collection [profiles]
[2015-07-02 17:49:39,617][TRACE][org.elasticsearch.river.mongodb.Indexer] updateBulkRequest for id: [5567676308731842ca887513], operation: [UPDATE]
[2015-07-02 17:49:39,617][TRACE][org.elasticsearch.river.mongodb.Indexer] Operation: UPDATE - index: profiles - type: profile - routing: null - parent: null
[2015-07-02 17:49:39,617][TRACE][org.elasticsearch.river.mongodb.Indexer] Update operation - id: 5567676308731842ca887513 - contains attachment: false
[2015-07-02 17:49:39,617][TRACE][org.elasticsearch.river.mongodb.Indexer] bulkDeleteRequest - objectId: 5567676308731842ca887513 - index: profiles - type: profile - routing: null - parent: null
[2015-07-02 17:49:39,617][TRACE][org.elasticsearch.river.mongodb.MongoDBRiverBulkProcessor] deleteBulkRequest - id: 5567676308731842ca887513 - index: profiles - type: profile - routing: null - parent: null
[2015-07-02 17:49:39,620][TRACE][org.elasticsearch.river.mongodb.MongoDBRiverBulkProcessor] bulkQueueSize [50] - queue [0] - availability [1]
[2015-07-02 17:49:39,620][TRACE][org.elasticsearch.river.mongodb.MongoDBRiverBulkProcessor] beforeBulk - new bulk [34119] of items [1]
[2015-07-02 17:49:39,620][TRACE][org.elasticsearch.river.mongodb.MongoDBRiverBulkProcessor] About to flush bulk request index[profiles] - type[profile]
[2015-07-02 17:49:39,620][TRACE][org.elasticsearch.river.mongodb.MongoDBRiverBulkProcessor] dropRecreateMapping index[profiles] - type[profile]
[2015-07-02 17:49:39,620][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] namespace: crawldb.profiles - operation: UPDATE
[2015-07-02 17:49:39,621][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] MongoDB object deserialized is 649 characters long
[2015-07-02 17:49:39,621][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] collection: profiles
[2015-07-02 17:49:39,621][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] oplog entry - namespace [crawldb.profiles], operation [UPDATE]
[2015-07-02 17:49:39,621][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] Updated item: { "_id" : { "$oid" : "556fb17a1df9fe6258775c75"}}
[2015-07-02 17:49:39,621][TRACE][org.elasticsearch.river.mongodb.OplogSlurper] addQueryToStream - operation [UPDATE], currentTimestamp [Timestamp.BSON(ts={ "$ts" : 1435859379 , "$inc" : 1})], update [{ "_id" : { "$oid" : "556fb17a1df9fe6258775c75"}}]
[2015-07-02 17:49:39,621][TRACE][org.elasticsearch.river.mongodb.MongoDBRiverBulkProcessor] mappings contains type profile: true