richardwilly98 / elasticsearch-river-mongodb

MongoDB River Plugin for ElasticSearch
1.12k stars 215 forks source link

Impossible to import collection with binary _id #590

Open nitmir opened 7 years ago

nitmir commented 7 years ago


I have installed this river following the wiki, here is my config:

  "index": {
    "name": "testdb",
    "type": "torrents"
  "mongodb": {
    "db": "testdb",
    "servers": [
        "port": 27017,
        "host": ""
    "credentials": [
        "db": "admin",
        "password": "password",
        "user": "username"
    "collection": "torrents_data",
    "options": {
      "exclude_fields": [
      "secondary_read_preference": true
  "type": "mongodb"

Here some logs:

[2017-02-13 12:24:40,272][INFO ][river.mongodb            ] [Nomad] Creating MongoClient for [[]]
[2017-02-13 12:24:41,793][INFO ][river.mongodb            ] [Nomad] [mongodb][testdb] MongoDB version - 3.2.11
[2017-02-13 12:24:41,923][INFO ][river.mongodb            ] [Nomad] [mongodb][testdb] MongoDBRiver is beginning initial import of btdht-crawler.torrents_data
[2017-02-13 12:24:42,649][DEBUG][action.bulk              ] [Nomad] [testdb][2] failed to execute bulk item (index) index {[testdb][torrents][[B@4c438a69], source[{"seeds_peers":0,"file_nb":1,"added":1.486630897982914E9,"_id":"AMAiYk0SsXkBnCD9lxr55m6m/F0=","complete":0,"created":1486630897,"name":"Setup Terraria GOG Version.exe","peers":0,"categories":["software"],"seeds":0,"last_scrape":1486630899,"size":137288792}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [_id]
    at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(
    at org.elasticsearch.index.mapper.internal.IdFieldMapper.parse(
    at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(
    at org.elasticsearch.index.mapper.object.ObjectMapper.parse(
    at org.elasticsearch.index.mapper.DocumentMapper.parse(
    at org.elasticsearch.index.mapper.DocumentMapper.parse(
    at org.elasticsearch.index.shard.IndexShard.prepareIndex(
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(
    at java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.util.concurrent.ThreadPoolExecutor$
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Provided id [[B@4c438a69] does not match the content one [AMAiYk0SsXkBnCD9lxr55m6m/F0=]
    at org.elasticsearch.index.mapper.internal.IdFieldMapper.parseCreateField(
    at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(
    ... 14 more

ending up with an IMPORT_FAILED status.

Here the mongodb document:

rs1:PRIMARY> db.torrents_data.find({_id: BinData(0,"AMAiYk0SsXkBnCD9lxr55m6m/F0=")})
{ "_id" : BinData(0,"AMAiYk0SsXkBnCD9lxr55m6m/F0="), "files" : null, "added" : 1486630897.982914, "name" : "Setup Terraria GOG Version.exe", "created" : 1486630897, "file_nb" : 1, "size" : 137288792, "peers" : 0, "seeds" : 0, "last_scrape" : 1486630899, "complete" : 0, "seeds_peers" : 0, "categories" : [ "software" ] }

So I am unable to index my mongodb collection: for every document, I get the error in the logs above. I am guessing that this may be due to the fact that my _id are binary data (non ascii, 20 bytes binary data), but I am no sure.

Does anyone known how to solve this ?

nitmir commented 7 years ago

I have tested with a cloned collection where _id are hexadecimally encoded and all the documents are successfully indexed, so I think this confirm that there is an issue with binary _id.