richardwilly98 / elasticsearch-river-mongodb

MongoDB River Plugin for ElasticSearch
1.12k stars 215 forks source link

Impossible to import collection with binary _id #590

Open nitmir opened 7 years ago

nitmir commented 7 years ago

Hi

I have installed this river following the wiki, here is my config:

{
  "index": {
    "name": "testdb",
    "type": "torrents"
  },
  "mongodb": {
    "db": "testdb",
    "servers": [
      {
        "port": 27017,
        "host": "127.0.0.1"
      }
    ],
    "credentials": [
      {
        "db": "admin",
        "password": "password",
        "user": "username"
      }
    ],
    "collection": "torrents_data",
    "options": {
      "exclude_fields": [
        "files"
      ],
      "secondary_read_preference": true
    }
  },
  "type": "mongodb"
}

Here some logs:

[2017-02-13 12:24:40,272][INFO ][river.mongodb            ] [Nomad] Creating MongoClient for [[127.0.0.1:27017]]
[2017-02-13 12:24:41,793][INFO ][river.mongodb            ] [Nomad] [mongodb][testdb] MongoDB version - 3.2.11
[2017-02-13 12:24:41,923][INFO ][river.mongodb            ] [Nomad] [mongodb][testdb] MongoDBRiver is beginning initial import of btdht-crawler.torrents_data
[2017-02-13 12:24:42,649][DEBUG][action.bulk              ] [Nomad] [testdb][2] failed to execute bulk item (index) index {[testdb][torrents][[B@4c438a69], source[{"seeds_peers":0,"file_nb":1,"added":1.486630897982914E9,"_id":"AMAiYk0SsXkBnCD9lxr55m6m/F0=","complete":0,"created":1486630897,"name":"Setup Terraria 1.3.0.3 GOG Version.exe","peers":0,"categories":["software"],"seeds":0,"last_scrape":1486630899,"size":137288792}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [_id]
    at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
    at org.elasticsearch.index.mapper.internal.IdFieldMapper.parse(IdFieldMapper.java:295)
    at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
    at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
    at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:493)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:409)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Provided id [[B@4c438a69] does not match the content one [AMAiYk0SsXkBnCD9lxr55m6m/F0=]
    at org.elasticsearch.index.mapper.internal.IdFieldMapper.parseCreateField(IdFieldMapper.java:310)
    at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
    ... 14 more

ending up with an IMPORT_FAILED status.

Here the mongodb document:

rs1:PRIMARY> db.torrents_data.find({_id: BinData(0,"AMAiYk0SsXkBnCD9lxr55m6m/F0=")})
{ "_id" : BinData(0,"AMAiYk0SsXkBnCD9lxr55m6m/F0="), "files" : null, "added" : 1486630897.982914, "name" : "Setup Terraria 1.3.0.3 GOG Version.exe", "created" : 1486630897, "file_nb" : 1, "size" : 137288792, "peers" : 0, "seeds" : 0, "last_scrape" : 1486630899, "complete" : 0, "seeds_peers" : 0, "categories" : [ "software" ] }

So I am unable to index my mongodb collection: for every document, I get the error in the logs above. I am guessing that this may be due to the fact that my _id are binary data (non ascii, 20 bytes binary data), but I am no sure.

Does anyone known how to solve this ?

nitmir commented 7 years ago

I have tested with a cloned collection where _id are hexadecimally encoded and all the documents are successfully indexed, so I think this confirm that there is an issue with binary _id.