richardwilly98 / elasticsearch-river-mongodb

MongoDB River Plugin for ElasticSearch
1.12k stars 215 forks source link

some documents are not correctly updated #211

Open jererc opened 10 years ago

jererc commented 10 years ago

Hello

We use ES v0.90.7, mongo v2.4.9, and river-mongodb v1.7.3.

I just noticed some documents from mongo are not correctly updated. Here is a log showing the operations for 2 different mongo documents which "deleted" and "modified" fields have been updated in mongo. The first one (_id 52f264addede831ca7c6a476) is correctly updated in ES while the second one (_id 52f264addede831ca7c6a477) is not.

The ES mapping contains "deleted" : {"type" : "boolean"} and "modified" : {"type" : "date", "format" : "dateOptionalTime"}

Here are the river settings: { "_index" : "_river", "_type" : "media", "_id" : "_meta", "_version" : 1, "exists" : true, "_source" : {"mongodb": {"db": "cloudgate-dev", "options": {"secondary_read_preference": true, "include_fields": ["deleted", "user", "created", "modified", "meta", "assets", "status"]}, "collection": "media", "servers": [{"host": "10.201.12.47", "port": 27017}, {"host": "10.201.12.48", "port": 27017}, {"host": "10.201.12.49", "port": 27017}]}, "type": "mongodb", "index": {"bulk": {"concurrent_requests": 1}, "type": "media", "name": "media"}} }

The log (sorry if it's too long):

[2014-02-05 17:20:44,770][DEBUG][org.elasticsearch.river.mongodb.Slurper] collection: media [2014-02-05 17:20:44,770][DEBUG][org.elasticsearch.river.mongodb.Slurper] oplog entry - namespace [cloudgate-dev.media], operation [UPDATE] [2014-02-05 17:20:44,770][DEBUG][org.elasticsearch.river.mongodb.Slurper] oplog processing item { "ts" : { "$ts" : 1391617244 , "$inc" : 1} , "h" : -9194859396347773319 , "v" : 2 , "op" : "u" , "ns" : "cloudgate-dev.media" , "o2" : { "_id" : { "$oid" : "52f264addede831ca7c6a476"}} , "o" : { "$set" : { "deleted" : true , "modified" : { "$date" : "2014-02-05T16:20:44.766Z"}}}} [2014-02-05 17:20:44,771][DEBUG][org.elasticsearch.river.mongodb.Slurper] Updated item: { "_id" : { "$oid" : "52f264addede831ca7c6a476"}} [2014-02-05 17:20:44,771][DEBUG][org.elasticsearch.river.mongodb.Slurper] addQueryToStream - operation [UPDATE], currentTimestamp [TS time:Wed Feb 05 17:20:44 CET 2014 inc:1], update [{ "_id" : { "$oid" : "52f264addede831ca7c6a476"}}] [2014-02-05 17:20:44,772][DEBUG][org.elasticsearch.river.mongodb.Slurper] addToStream - operation [UPDATE], currentTimestamp [TS time:Wed Feb 05 17:20:44 CET 2014 inc:1], data [{ "_id" : { "$oid" : "52f264addede831ca7c6a476"} , "assets" : [ { "audio_bitrate" : 63777 , "audio_codec" : "MPEG-2 Audio layer 3" , "audio_nbr_channel" : 1 , "audio_samplerate" : 22050 , "checksum" : "a3f8216bcd983fa08367e9f45f29e24b" , "container" : "Flash Video" , "created" : { "$date" : "2014-02-05T16:20:08.973Z"} , "done" : true , "duration" : 17 , "file_extension" : "flv" , "file_size" : 830966 , "global_bitrate" : 368978 , "name" : "source" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad79c7a520f12b918c" , "type" : "vod" , "video_aspect" : 1.3333333333333333 , "video_bitrate" : 297209 , "video_codec" : "Sorenson H263" , "video_fps" : 29.97 , "video_fps_mode" : "CFR" , "video_height" : 240 , "video_interlaced" : false , "video_rotation" : 0.0 , "video_width" : 320} , { "audio_bitrate" : 37981 , "audio_codec" : "AAC LC" , "container" : "MPEG-4" , "created" : { "$date" : "2014-02-05T16:20:21.159Z"} , "done" : true , "duration" : 18 , "file_size" : 1610383 , "global_bitrate" : 712913 , "name" : "mp4_h264_aac_hq" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad5d5a182e493c4683" , "type" : "vod" , "video_bitrate" : 700000 , "video_codec" : "AVC" , "video_height" : 240 , "video_width" : 320} , { "container" : "JPEG" , "created" : { "$date" : "2014-02-05T16:20:13.569Z"} , "done" : true , "file_size" : 36353 , "name" : "jpeg_thumbnail_source" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad79c7a5210422cd68" , "type" : "thumbnail" , "video_codec" : "MJPEG (Motion JPEG)" , "video_height" : 480 , "video_width" : 640} , { "container" : "JPEG" , "created" : { "$date" : "2014-02-05T16:20:11.147Z"} , "done" : true , "file_size" : 15040 , "name" : "jpeg_thumbnail_large" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad5d5a182e344a92c4" , "type" : "thumbnail" , "video_codec" : "MJPEG (Motion JPEG)" , "video_height" : 240 , "video_width" : 320} , { "container" : "JPEG" , "created" : { "$date" : "2014-02-05T16:20:11.104Z"} , "done" : true , "file_size" : 6806 , "name" : "jpeg_thumbnail_medium" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad5d5a182e1e874567" , "type" : "thumbnail" , "video_codec" : "MJPEG (Motion JPEG)" , "video_height" : 120 , "video_width" : 160} , { "container" : "JPEG" , "created" : { "$date" : "2014-02-05T16:20:10.311Z"} , "done" : true , "file_size" : 2753 , "name" : "jpeg_thumbnail_small" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad5d5a182dd1807d51" , "type" : "thumbnail" , "video_codec" : "MJPEG (Motion JPEG)" , "video_height" : 60 , "video_width" : 80} , { "audio_bitrate" : 37981 , "audio_codec" : "AAC LC" , "container" : "MPEG-4" , "created" : { "$date" : "2014-02-05T16:20:16.250Z"} , "done" : true , "duration" : 18 , "file_size" : 921018 , "global_bitrate" : 407733 , "name" : "mp4_h264_aac" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad5d5a182e493c4684" , "type" : "vod" , "video_bitrate" : 360000 , "video_codec" : "AVC" , "video_height" : 240 , "video_width" : 320} , { "audio_bitrate" : 31914 , "audio_codec" : "AAC LC" , "container" : "MPEG-4" , "created" : { "$date" : "2014-02-05T16:20:19.804Z"} , "done" : true , "duration" : 18 , "file_size" : 502725 , "global_bitrate" : 222555 , "name" : "mp4_h264_aac_ld" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad79c7a5210422cd69" , "type" : "vod" , "video_bitrate" : 180000 , "video_codec" : "AVC" , "video_height" : 240 , "video_width" : 320}] , "created" : { "$date" : "2014-02-05T16:19:57.469Z"} , "deleted" : true , "meta" : { "title" : "Chat-justin-bieber 3.flv"} , "modified" : { "$date" : "2014-02-05T16:20:44.766Z"} , "status" : "ready" , "user" : { "$oid" : "50aa108ddede83201c00000a"}}], collection [media] [2014-02-05 17:20:44,772][DEBUG][org.elasticsearch.river.mongodb.Indexer] updateBulkRequest for id: [52f264addede831ca7c6a476], operation: [UPDATE] [2014-02-05 17:20:44,773][DEBUG][org.elasticsearch.river.mongodb.Indexer] Operation: UPDATE - index: media - type: media - routing: null - parent: null [2014-02-05 17:20:44,773][DEBUG][org.elasticsearch.river.mongodb.Indexer] Update operation - id: 52f264addede831ca7c6a476 - contains attachment: false [2014-02-05 17:20:44,773][DEBUG][org.elasticsearch.river.mongodb.Slurper] collection: media [2014-02-05 17:20:44,773][DEBUG][org.elasticsearch.river.mongodb.Slurper] oplog entry - namespace [cloudgate-dev.media], operation [UPDATE] [2014-02-05 17:20:44,773][DEBUG][org.elasticsearch.river.mongodb.Slurper] oplog processing item { "ts" : { "$ts" : 1391617244 , "$inc" : 2} , "h" : 8308485158567442109 , "v" : 2 , "op" : "u" , "ns" : "cloudgate-dev.media" , "o2" : { "_id" : { "$oid" : "52f264addede831ca7c6a477"}} , "o" : { "$set" : { "deleted" : true , "modified" : { "$date" : "2014-02-05T16:20:44.766Z"}}}} [2014-02-05 17:20:44,773][DEBUG][org.elasticsearch.river.mongodb.Slurper] Updated item: { "_id" : { "$oid" : "52f264addede831ca7c6a477"}} [2014-02-05 17:20:44,773][DEBUG][org.elasticsearch.river.mongodb.Slurper] addQueryToStream - operation [UPDATE], currentTimestamp [TS time:Wed Feb 05 17:20:44 CET 2014 inc:2], update [{ "_id" : { "$oid" : "52f264addede831ca7c6a477"}}] [2014-02-05 17:20:44,775][DEBUG][org.elasticsearch.river.mongodb.Slurper] addToStream - operation [UPDATE], currentTimestamp [TS time:Wed Feb 05 17:20:44 CET 2014 inc:2], data [{ "_id" : { "$oid" : "52f264addede831ca7c6a477"} , "assets" : [ { "audio_bitrate" : 63777 , "audio_codec" : "MPEG-2 Audio layer 3" , "audio_nbr_channel" : 1 , "audio_samplerate" : 22050 , "checksum" : "a3f8216bcd983fa08367e9f45f29e24b" , "container" : "Flash Video" , "created" : { "$date" : "2014-02-05T16:20:08.960Z"} , "done" : true , "duration" : 17 , "file_extension" : "flv" , "file_size" : 830966 , "global_bitrate" : 368978 , "name" : "source" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad79c7a520c3fd0a7f" , "type" : "vod" , "video_aspect" : 1.3333333333333333 , "video_bitrate" : 297209 , "video_codec" : "Sorenson H263" , "video_fps" : 29.97 , "video_fps_mode" : "CFR" , "video_height" : 240 , "video_interlaced" : false , "video_rotation" : 0.0 , "video_width" : 320} , { "audio_bitrate" : 37981 , "audio_codec" : "AAC LC" , "container" : "MPEG-4" , "created" : { "$date" : "2014-02-05T16:20:16.433Z"} , "done" : true , "duration" : 18 , "file_size" : 1610383 , "global_bitrate" : 712913 , "name" : "mp4_h264_aac_hq" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad5d5a182dd1807d52" , "type" : "vod" , "video_bitrate" : 700000 , "video_codec" : "AVC" , "video_height" : 240 , "video_width" : 320} , { "container" : "JPEG" , "created" : { "$date" : "2014-02-05T16:20:13.605Z"} , "done" : true , "file_size" : 36353 , "name" : "jpeg_thumbnail_source" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad79c7a521123faf03" , "type" : "thumbnail" , "video_codec" : "MJPEG (Motion JPEG)" , "video_height" : 480 , "video_width" : 640} , { "container" : "JPEG" , "created" : { "$date" : "2014-02-05T16:20:12.782Z"} , "done" : true , "file_size" : 15040 , "name" : "jpeg_thumbnail_large" , "ready" : true , "status" : "ready" , "task_id" : "52f264ad79c7a520f12b918e" , "type" : "thumbnail" , "video_codec" : "MJPEG (Motion JPEG)" , "video_height" : 240 , "video_width" : 320} , { "container" : "JPEG" , "created" : { "$date" : "2014-02-05T16:20:14.368Z"} , "done" : true , "file_size" : 6806 , "name" : "jpeg_thumbnail_medium" , "ready" : true , "status" : "ready" , "task_id" : "52f264ae5d5a182e1e874568" , "type" : "thumbnail" , "video_codec" : "MJPEG (Motion JPEG)" , "video_height" : 120 , "video_width" : 160} , { "container" : "JPEG" , "created" : { "$date" : "2014-02-05T16:20:14.496Z"} , "done" : true , "file_size" : 2753 , "name" : "jpeg_thumbnail_small" , "ready" : true , "status" : "ready" , "task_id" : "52f264ae79c7a520c3fd0a80" , "type" : "thumbnail" , "video_codec" : "MJPEG (Motion JPEG)" , "video_height" : 60 , "video_width" : 80} , { "audio_bitrate" : 37981 , "audio_codec" : "AAC LC" , "container" : "MPEG-4" , "created" : { "$date" : "2014-02-05T16:20:15.874Z"} , "done" : true , "duration" : 18 , "file_size" : 921018 , "global_bitrate" : 407733 , "name" : "mp4_h264_aac" , "ready" : true , "status" : "ready" , "task_id" : "52f264ae5d5a182e493c4686" , "type" : "vod" , "video_bitrate" : 360000 , "video_codec" : "AVC" , "video_height" : 240 , "video_width" : 320} , { "audio_bitrate" : 31914 , "audio_codec" : "AAC LC" , "container" : "MPEG-4" , "created" : { "$date" : "2014-02-05T16:20:22.606Z"} , "done" : true , "duration" : 18 , "file_size" : 502725 , "global_bitrate" : 222555 , "name" : "mp4_h264_aac_ld" , "ready" : true , "status" : "ready" , "task_id" : "52f264ae5d5a182e1e874569" , "type" : "vod" , "video_bitrate" : 180000 , "video_codec" : "AVC" , "video_height" : 240 , "video_width" : 320}] , "created" : { "$date" : "2014-02-05T16:19:57.873Z"} , "deleted" : false , "meta" : { "title" : "Chat-justin-bieber 4.flv"} , "modified" : { "$date" : "2014-02-05T16:20:22.607Z"} , "status" : "ready" , "user" : { "$oid" : "50aa108ddede83201c00000a"}}], collection [media] [2014-02-05 17:20:44,775][DEBUG][org.elasticsearch.river.mongodb.Slurper] collection: media [2014-02-05 17:20:44,775][DEBUG][org.elasticsearch.river.mongodb.Slurper] oplog entry - namespace [cloudgate-dev.media], operation [UPDATE] [2014-02-05 17:20:44,775][DEBUG][org.elasticsearch.river.mongodb.Indexer] updateBulkRequest for id: [52f264addede831ca7c6a477], operation: [UPDATE] [2014-02-05 17:20:44,775][DEBUG][org.elasticsearch.river.mongodb.Indexer] Operation: UPDATE - index: media - type: media - routing: null - parent: null [2014-02-05 17:20:44,775][DEBUG][org.elasticsearch.river.mongodb.Indexer] Update operation - id: 52f264addede831ca7c6a477 - contains attachment: false

Best regards, Jerome

richardwilly98 commented 10 years ago

@jererc can you please provide the document attributes values in MongoDB (especially modified and delete before the update and after)? I will write a unit test.

jererc commented 10 years ago

Hello

the mongodb documents fields and values and the operation from the log are:

EDIT: I forgot to mention the values have been correctly updated in mongodb

Best regards, Jerome

richardwilly98 commented 10 years ago

@jererc I cannot reproduce the issue. Can you please take a look at the test just committed and let me know if it matches your use case.

jererc commented 10 years ago

The problem is the issue is kind of random. We reproduced it 2 times by updating a bunch of mongo documents ({ "$set" : { "deleted" : true , "modified" : { "$date" : }}}) but it does not happen every time... I have been trying to reproduce it in a test but I have no clue what could possibly trigger this.

alistair commented 10 years ago

We have been seeing similar issues to this within our application. We are speculating here but does elasticsearch (and this river) support having multiple updates to the same document within a batch?

to expand. We have oplog.rs records of

> db.oplog.rs.find( { "o2._id" : "e8e1d6b7-14fa-47dd-a397-33ca428cc5f5" })
{
  ts: Timestamp(1394590962, 3),
  h: NumberLong("-5053893563459561533"),
  v: 2,
  op: "u",
  ns: "waggle.OrganizationProfile",
  o2: {
    _id: "e8e1d6b7-14fa-47dd-a397-33ca428cc5f5"
  },
  o: {
    "$set": {
      Campaigns: [ ],
      Candidate: null,
      CandidateType: 0,
      FirstJoined: ISODate("2014-03-12T02:21:50.640Z"),
      JoinCampaign: null,
      LastJoined: ISODate("2014-03-12T02:21:50.640Z"),
      NoCall: false,
      "Profile.PersonalDetail": {
        FirstName: null,
        LastName: null,
        Email: "me12345678@findly.com",
        Location: {
          Address: null,
          City: null,
          State: null,
          Country: null,
          PostCode: null,
          Coordinates: [
            0,
            0
          ]
        },
        Phone: null,
        HighestEducationLevel: null
      },
      "Profile.PhotoUrl140": "manual_140x140.jpg",
      "Profile.PhotoUrl50": "manual_50x50.jpg",
      "Profile.PhotoUrl80": "manual_80x80.jpg",
      "Profile.ProfessionalDetail": {
        Education: [ ],
        WorkHistories: [
          {
            Employer: null,
            JobTitle: null,
            Description: null,
            StartDate: null,
            EndDate: null,
            IsCurrent: true
          }
        ],
        Interests: [
          {
            _id: 10,
            Title: "Internships"
          },
          {
            _id: 19,
            Title: "Jobs for Recent Grads"
          }
        ]
      },
      "Profile.SocialDetail": {
        LinkedIn: null,
        GooglePlus: null,
        Facebook: null
      },
      ReferringUrl: null,
      Source: "",
      Version: 2
    }
  }
}
{
  ts: Timestamp(1394590985, 2),
  h: NumberLong("4000612255181295220"),
  v: 2,
  op: "u",
  ns: "waggle.OrganizationProfile",
  o2: {
    _id: "e8e1d6b7-14fa-47dd-a397-33ca428cc5f5"
  },
  o: {
    "$set": {
      Campaigns: [ ],
      Candidate: null,
      CandidateType: 0,
      FirstJoined: ISODate("2014-03-12T02:21:50.640Z"),
      JoinCampaign: null,
      LastJoined: ISODate("2014-03-12T02:21:50.640Z"),
      NoCall: false,
      "Profile.PersonalDetail": {
        FirstName: "Earl",
        LastName: "dasdas",
        Email: "me12345678@findly.com",
        Location: {
          Address: null,
          City: "Auckland",
          State: "Auckland",
          Country: "NZ",
          PostCode: null,
          Coordinates: [
            174.76333150000005,
            -36.8484597
          ]
        },
        Phone: "88899",
        HighestEducationLevel: 1
      },
      "Profile.PhotoUrl140": manual_140x140.jpg",
      "Profile.PhotoUrl50": "manual_50x50.jpg",
      "Profile.PhotoUrl80": "manual_80x80.jpg",
      "Profile.ProfessionalDetail": {
        Education: [ ],
        WorkHistories: [
          {
            Employer: "qwerty",
            JobTitle: "Senior Software Developer",
            Description: null,
            StartDate: null,
            EndDate: null,
            IsCurrent: true
          }
        ],
        Interests: [
          {
            _id: 9,
            Title: "Information Technology"
          }
        ]
      },
      "Profile.SocialDetail": {
        LinkedIn: null,
        GooglePlus: null,
        Facebook: null
      },
      ReferringUrl: null,
      Source: "",
      Version: 3
    }
  }
}

As you hopefully can see the oplog times of these is close together. We are also forcing reads to be from a secondary which might be contributing to the issue?

After processing these records we are able to see that the document in elasticsearch has Version:2 instead of Version:3. (plus is missing all the other data).

benmccann commented 9 years ago

Is your MongoDB installation sharded?