Open sameerkattel opened 5 years ago
Assuming you are not doing any filtering then failures in indexing. These should be logged by monstache.
Yes there is no filtering done. I am running monstache in docker container and i don't see any error logs in docker logs.
Are you using direct reads to sync all the collections with the indexes? This usually works for me to copy all the data.
Yes I am using direct reads to sync all the data in one collection. It used to work .. For sometime the count was in sync and only lately the count started to differ and because of that i tried syncing data from beginning but that did not help.
That's a mystery to me. It should be printing all bulk line items with errors using this callback: https://github.com/rwynn/monstache/blob/master/monstache.go#L379
Another possibility is that some data is MongoDB is not able to be serialized to json for sending. However, that error also should be getting returned and eventually printed: https://github.com/rwynn/monstache/blob/master/monstache.go#L2717
Is it also listening for change events on this collection? You will need that if you are changing the collection while you are reading it in a direct-read.
The replay
option will not work as a full sync unless your oplog is very large. Usually, this would require you to increase it's size via configuration. Since the oplog is a capped collection eventually the old data gets dropped.
That is why a direct-read
is better for full sync. Usually, you have monstache also listening for new changes while the direct reads are being performed.
I am just wildly guessing it's some serialization issue but logs does not support it. And yes I am doing direct-read for full sync with listening for change events.
mongo-version : 4.3 ES - Version : 6.8.1
I see a similar mismatch in countDocuments() from a list of mongo collections and the number of documents which actually get indexed in ES.
I have same issue and nothing in error logs. i used direct-read and listen change-stream for only one collection. I'm using Mongodb 4.2: total of collection is 22129296 when i sync data to Elasticsearch 7.6.2: count of index is 22129286. i have tried again 5 times. but the count of index Elasticsearch always lost 10
I have found the problem. In Mongodb, i have 10 documents with type of field _id is ObjectId different from other ids (type String), so I think this tool direct-read from MongoDB cannot find id by type ObjectId.
Hi @dachuylinux, is it possible that these 10 documents are strings that look like ObjectId as hex? When monstache sends to Elasticsearch it needs to send a string. So ObjectId would be converted to string using hex representation. Is it possible these 10 document ids share same value as another id (type ObjectID) in the collection converted to hex?
We are using monstache to sync mongdb to es. There was a document count differences in mongodb and elasticsearch. And seeing this we again did the full sync of mongodb to elasticsearch from the beginning. Now the sync is completed, the difference still remains. What can cause the differences in doucment counts between mongo and es? In mono count is : 222632036 es count is : 222629754