yougov / mongo-connector

MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)
Apache License 2.0
1.88k stars 479 forks source link

Facing issue's while updating documents from MongoDB to solr #299

Closed gsuresh92 closed 9 years ago

gsuresh92 commented 9 years ago

1) Using Mongo-solr connector to dump the data from mongoDB to solr and the initial dump is working fine. 2) While updating a record in MongoDB using db.check.save() on existing "_id". i am getting an error KeyError: '_id' 3) In solr_doc_manager.py (line no:196 "update_spec['_id'] = doc['_id']") we are not checking the input "--unique-key" . I feel that it should be "update_spec['_id'] = doc[self.unique_key]" because the update_spec will definitely have '_id' as we are sourcing from mongo and the doc will have the id based on --unique-key option. so i should be always "doc[self.unique_key]"

mongo-connector command used:-

mongo-connector -m localhost:27017 -t http://localhost:8983/solr/check -d solr_doc_manager --unique-key=id --auto-commit-interval=0 -n a.check --oplog-ts check

Exception in thread Thread-2: Traceback (most recent call last): File "c:\Python27\lib\threading.py", line 551, in __bootstrap_inner self.run() File "c:\Python27\lib\site-packages\mongo_connector\util.py", line 85, in wrapped func(_args, _kwargs) File "c:\Python27\lib\site-packages\mongo_connector\oplog_manager.py", line 263, in run ns, timestamp) File "c:\Python27\lib\site-packages\mongo_connector\util.py", line 32, in wrapped return f(_args, _kwargs) File "c:\Python27\lib\site-packages\mongo_connector\doc_managers\solr_doc_manager.py", line 249, in update updated = self.apply_update(doc, update_spec) File "c:\Python27\lib\site-packages\mongo_connector\doc_managers\solr_doc_manager.py", line 197, in apply_update update_spec['_id'] = doc['_id'] KeyError: '_id'

llvtt commented 9 years ago

@gsuresh92 Thanks for filing this issue! I think you're right about this problem, but actually both instances where _id are present in that line should be self.unique_key. Would you like to make a pull request to fix this? It's a simple fix, and I'd like to give you credit for finding and fixing it. If it's too much time, just let me know and I'll make a patch.

Thanks again.

gsuresh92 commented 9 years ago

sure @llvtt . I will send a pull request. Just give me a day's time, So that i can cross check it in multiple ways before committing it. And i think the change should be only in doc dictionary, because in my case, i gave the option as "--unique-key=id" so it will take "_id" in mongo as "id" in solr as per the documentation. If i modify it as "update_spec[self.unique_key] " the mongo document won't be having any "id" key and it will again raise an error. Let me know your views on this.

llvtt commented 9 years ago

resolved in 107a0e24e5c6415effcf6b880e81d2cf19bba7c2. Thanks for filing this issue!