Open sachinnagesh opened 3 years ago
Hi @sachinnagesh the configuration you have looks good to me given what you are trying to accomplish.
Is there any pattern to the docs that are missing? About how many docs are missing?
Any information in the monstache.stats.yyyy-mm-dd
collection about errors?
You may want to try monstache v6.7.1
which includes MongoDB driver upgrades just in case that is relevant.
@rwynn Thank you for your response
Is there any pattern to the docs that are missing? About how many docs are missing? => I tried several times by clearing es index data and monstache database entries, it copied records in range of 8 lakh to 12 lakh and stopped copying.
Any information in the monstache.stats.yyyy-mm-dd collection about errors? => Failed document count is there.
{"index":{"_index":"monstache.stats.2020-11-11"}}
{"Host":"c21ff46c2f1a","Pid":253,"Stats":{"Flushed":3130,"Committed":3573,"Indexed":400546,"Created":0,"Updated":0,"Deleted":58,"Succeeded":398066,"Failed":2538,"Workers":[{"Queued":0,"LastDuration":261000000},{"Queued":0,"LastDuration":72000000}]},"Timestamp":"2020-11-11T09:06:06"}
{"took":54,"errors":false,"items":[{"index":{"_index":"monstache.stats.2020-11-11","_type":"_doc","_id":"TW-OtnUBf9Z0FIwLKpe6","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":241,"_primary_term":1,"status":201}}]}
{"index":{"_index":"monstache.stats.2020-11-11"}}
{"Host":"c21ff46c2f1a","Pid":253,"Stats":{"Flushed":3139,"Committed":3585,"Indexed":400605,"Created":0,"Updated":0,"Deleted":58,"Succeeded":398117,"Failed":2546,"Workers":[{"Queued":0,"LastDuration":368000000},{"Queued":0,"LastDuration":535000000}]},"Timestamp":"2020-11-11T09:06:38"}
{"took":39,"errors":false,"items":[{"index":{"_index":"monstache.stats.2020-11-11","_type":"_doc","_id":"aCuOtnUB7gYr86GnnyDq","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":195,"_primary_term":1,"status":201}}]}
@rwynn One thing I forgot to mention, I am giving very less resources to it. I have given 1 CPU core and 1 GB memory, as I am fine if it takes time for complete sync and also I don't want to put load on mongodb instance. One more monstache instance I am running for different purpose with different cluster-name
which is synching data to same ES cluster from same mongodb cluster.
@rwynn I gave 4 CPU core and 4GB RAM, now it's copying more records. It copied 1.7M records and again it stopped copying. I am deploying monstache in HA mode by deploying two containers with same cluster-name
, so one will be active node at a time. I have a doubt, what if during copying of view from mongo to es, what if my current active node go in pause state and other node became active node, will it start copying that view from very first record again? or will it resume from last record copied by old active node?
@sachinnagesh you might try without HA mode since then the code path will be simpler and less chance for deadlock. In HA mode the 2nd process will repeat the full copy of the collection unless it is stateful read and the 1st process has already marked it as complete. There is no concept of resuming the direct read in monstache, only resuming the change stream.
@rwynn Deploying a single node without HA worked for me. Somehow can we achieve resume for direct read in monstache? It will be a very nice feature to have when data size is too large. Currently either we need to deploy two different instances one for direct read without HA and other for change stream with HA as we need to achieve HA or for very first deployment, deploy monstache without HA and once complete direct read is done then again redeploy it with enabling HA.
First of all thanks @rwynn for this amazing library to sync data from mongodb. I am using
I am facing a issue where my whole view is not getting completely copied to ES index. I am having around 3M records for view. On first deployment I want completely copy view from mongo to es and also sync any real time operations from mongo view to ES index. When I deploy monstache it's partially copying view and marking it as complete in
directreads
collection undermonstache
database. My config looks something like this,I am passing mongo and ES config as environment variable. Similar configuration works on other enviroments where data is around 2 lakh records.
Only getting this error in log which I am getting for other env where it works fine.