yougov / mongo-connector

MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)
Apache License 2.0
1.88k stars 478 forks source link

Mongodb ObjectIds have dashes when indexed in solr #440

Closed prasadh13 closed 8 years ago

prasadh13 commented 8 years ago

When the documents are indexed in Solr, the ids are populated with something like this - da7885a3-8a83-4ed0-a593-7148514978db and mongodb id is "_id" : ObjectId("56e80dac973675355716f6be")

Can someone explain what is going on here?

Thanks!

aherlihy commented 8 years ago

Hello,

Sorry for the delay! What version of Solr are you using? Also, what version of mongo-connector?

Thanks!

prasadh13 commented 8 years ago

Hello,

Here are the versions-

MongoDB - 3.0.3 MongoConnector - 2.2 Solr -5.5.0

aherlihy commented 8 years ago

Hi @prasadh13, I've tried to reproduce the issue but haven't had any luck. All the _id fields in the solr documents look the same as the ObjectId in MongoDB. What version of Python are you using? How are you running Mongo-Connector, do you have a configuration file?

prasadh13 commented 8 years ago

I am runnning it as a python service. Here is my config file- { "comment": "Configuration options starting with '' are disabled", "__comment": "To enable them, remove the preceding '__'",

"mainAddress": "172.31.45.59:27017",
"oplogFile": "/home/ubuntu/mongo-connector/oplog.timestamp",
"noDump": false,
"batchSize": -1,
"verbosity": 3,
"continueOnError": false,

"logging": {
    "type": "file",
    "filename": "/var/log/mongo-connector/mongo-connector.log",
    "__format": "%(asctime)s [%(levelname)s] %(name)s:%(lineno)d - %(message)s",
    "__rotationWhen": "D",
    "__rotationInterval": 1,
    "__rotationBackups": 10,

    "__type": "syslog",
    "__host": "localhost:514"
},
"docManagers": [
    {
        "docManager": "solr_doc_manager",
        "targetURL": "http://localhost:8983/solr/sambuq",
        "__bulkSize": 1000,
        "__uniqueKey": "id",
        "__autoCommitInterval": 0
    }
]

}

its running on python 2.7.6

The documents were indexed with their original ids locally. But when I pushed the same code to the server, it indexed wrong ids. Upon checking, I have a mongodb version difference. I am running 3.2.1 locally. Do you think that is the issue?

aherlihy commented 8 years ago

I just tested with 3.2.1 and your config file (without the logging) and it seems to be copying the id's over correctly. I'm not sure what you mean by pushed the same code to your server, could you be more specific about exactly what you did?

Also, could you provide your schema.xml file?

Thank you!

prasadh13 commented 8 years ago

On my local machine, I have set up my solr instance and have indexed the documents. After querying solr, I am trying to return the complex documents by equating the objectids returned by solr and objectids in my existing monodb using mongoose. Which is working fine. I am able to get all the documents as per need.

And I translated the same code to the server, but it failed to index only the id field correctly. So I cannot populate the complex documents because there are no ids to match. And Mongoose gives me a cast error because it cannot convert the returned ids to ObjectId.

prasadh13 commented 8 years ago

My schema.xml file-

<?xml version="1.0" encoding="UTF-8" ?>

id ``` ``` ``` ``` ``` false
prasadh13 commented 8 years ago

Sorry for that!

My schema.xml contains simple string fields.

``` ``` -->

This is working completely fine on local machine, but not on the remote server.

prasadh13 commented 8 years ago

I updated the mongo-connector to v2.3 and it seems to be working fine now. Closing this issue.

@aherlihy thanks for your time!