Open erdincocak opened 5 years ago
Please post your Monstache config, Monstache version, and Mongodb and Elasticsearch versions.
Also the result of adding -print-config to the command. And the output of Monstache.
I don't use config file. Actually I could not find it so I run monstache via command line with all settings. Monstache version: 6.0.10 Elasticsearch version: 7.2.0 Mongodb version: 4
Result of -print-config:
INFO 2019/07/05 16:35:25 { "EnableTemplate": false, "EnvDelimiter": ",", "MongoURL": "mongodb+srv://DB", "MongoConfigURL": "", "MongoOpLogDatabaseName": "", "MongoOpLogCollectionName": "", "GtmSettings": { "ChannelSize": 512, "BufferSize": 32, "BufferDuration": "75ms" }, "AWSConnect": { "AccessKey": "", "SecretKey": "", "Region": "" }, "Logs": { "Info": "", "Warn": "", "Error": "", "Trace": "", "Stats": "" }, "GraylogAddr": "", "ElasticUrls": [ "http://localhost:9200" ], "ElasticUser": "", "ElasticPassword": "", "ElasticPemFile": "", "ElasticValidatePemFile": true, "ElasticVersion": "", "ElasticHealth0": 15, "ElasticHealth1": 5, "ResumeName": "default", "NsRegex": "", "NsDropRegex": "", "NsExcludeRegex": "", "NsDropExcludeRegex": "", "ClusterName": "", "Print": true, "Version": false, "Pprof": false, "EnableOplog": false, "DisableChangeEvents": false, "EnableEasyJSON": false, "Stats": false, "IndexStats": false, "StatsDuration": "", "StatsIndexFormat": "monstache.stats.2006-01-02", "Gzip": false, "Verbose": false, "Resume": false, "ResumeWriteUnsafe": false, "ResumeFromTimestamp": 0, "Replay": false, "DroppedDatabases": true, "DroppedCollections": true, "IndexFiles": false, "IndexAsUpdate": false, "FileHighlighting": false, "EnablePatches": false, "FailFast": false, "IndexOplogTime": false, "OplogTsFieldName": "oplog_ts", "OplogDateFieldName": "oplog_date", "OplogDateFieldFormat": "2006/01/02 15:04:05", "ExitAfterDirectReads": false, "MergePatchAttr": "json-merge-patches", "ElasticMaxConns": 4, "ElasticRetry": false, "ElasticMaxDocs": -1, "ElasticMaxBytes": 8388608, "ElasticMaxSeconds": 5, "ElasticClientTimeout": 0, "ElasticMajorVersion": 0, "ElasticMinorVersion": 0, "MaxFileSize": 0, "ConfigFile": "", "Script": null, "Filter": null, "Pipeline": null, "Mapping": null, "Relate": null, "FileNamespaces": null, "PatchNamespaces": null, "Workers": null, "Worker": "", "ChangeStreamNs": [ "" ], "DirectReadNs": [ "portaldb.candidate" ], "DirectReadSplitMax": 0, "DirectReadConcur": 0, "DirectReadNoTimeout": false, "MapperPluginPath": "", "EnableHTTPServer": false, "HTTPServerAddr": ":8080", "TimeMachineNamespaces": null, "TimeMachineIndexPrefix": "log", "TimeMachineIndexSuffix": "2006-01-02", "TimeMachineDirectReads": false, "PipeAllowDisk": false, "RoutingNamespaces": null, "DeleteStrategy": 0, "DeleteIndexPattern": "*", "ConfigDatabaseName": "monstache", "FileDownloaders": 0, "RelateThreads": 10, "RelateBuffer": 1000, "PostProcessors": 0, "PruneInvalidJSON": false, "Debug": false }
Output of monstache:
INFO 2019/07/05 16:37:53 Started monstache version 6.0.10
INFO 2019/07/05 16:37:53 Successfully connected to MongoDB version 4.0.10
INFO 2019/07/05 16:37:54 Successfully connected to Elasticsearch version 7.2.0
INFO 2019/07/05 16:37:54 Listening for events
INFO 2019/07/05 16:37:54 Watching changes on the deployment
INFO 2019/07/05 16:37:54 Direct reads completed
Thanks for the info. Output looks good, seeing direct reads complete msg. And no errors.
Assuming docs exist in portaldb.candidate and mongo user has read permission on this.
You could add -stats and -verbose just for testing. That should show all requests.
Docs from direct reads should be going to index named portaldb.candidate.
One thing to note, when I tried with deleting cluster-name from command string, it directly indexed over 4K docs. Then stopped again. Now only indexed docs whenever any changes occur on it. Only updated document is indexed.
I added --stats --verbose
These kind of messages return right after every document indexing: {"took":8,"errors":false,"items":[{"index":{"_index":"portaldb.candidate","_type":"_doc","_id":"432efc44-269e-40a8-908e-1b295d049ef3","_version":6710180466290327553,"result":"updated","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":17,"_primary_term":3,"status":200}}]} STATS 2019/07/05 17:08:17 {"Flushed":24,"Committed":3,"Indexed":3,"Created":0,"Updated":0,"Deleted":0,"Succeeded":3,"Failed":0,"Workers":[{"Queued":0,"LastDuration":8000000},{"Queued":0,"LastDuration":0},{"Queued":0,"LastDuration":0},{"Queued":0,"LastDuration":0}]}
You don't need cluster-name
that is for high availability and requires user has write access to collection monstache.monstache
.
I'm not sure I understand what you mean by stopped again. If it directly indexed 4k docs, then it seems like it is working. Direct reads only copy the collection once to Elasticsearch per run. In addition to the copy it should be listening to all changes on the cluster and syncing them (insert, modify, delete) until the process is stopped.
I said stopped becaues there are 280K docs inside. It indexed only 4K and it happened just once. When I try again it did not indexed just continued to index changed docs.
sometimes this error occurs in log: ERROR 2019/07/05 17:31:59 Bulk response item: {"_index":"portaldb.candidate","_type":"_doc","_id":"3d706307-d1a8-4edb-aa6f-3679dc1631c3","status":409,"error":{"type":"version_conflict_engine_exception","reason":"[3d706307-d1a8-4edb-aa6f-3679dc1631c3]: version conflict, current version [6710186809957023750] is higher or equal to the one provided [6710186809957023748]","index":"portaldb.candidate"}}
You can ignore those errors because they mean that you already have a newer version of the doc in Elasticsearch.
https://www.elastic.co/blog/elasticsearch-versioning-support
As for the collection not entirely syncing: I'm not sure. There is another simliar issue but for 16 milliion docs. I cannot replicate the issue with partial copy. I just put 500K docs in a test collection and they all synced.
It's hard to say though cause everyone is on different versions of MongoDB. I currently have 4.0.10 in my VM.
Flushed":126,"Committed":7,"Indexed":9,"Created":0,"Updated":0,"Deleted":0,"Succeeded":6,"Failed":3
Flushed keeps increasing but not others and I see no change in doc count on Kibana. What does that mean? Is it about elasticsearch-max-bytes and elasticsearch-max-docs configuration?
Flushed will increase every 5s cause that is the auto flush interval.
My output looks like this...
INFO 2019/07/05 14:46:06 Started monstache version 6.0.10
INFO 2019/07/05 14:46:06 Successfully connected to MongoDB version 4.0.10
INFO 2019/07/05 14:46:06 Successfully connected to Elasticsearch version 7.0.0
INFO 2019/07/05 14:46:06 Listening for events
INFO 2019/07/05 14:46:06 Watching changes on the deployment
INFO 2019/07/05 14:46:20 Direct reads completed
STATS 2019/07/05 14:46:36 {"Flushed":5,"Committed":10,"Indexed":388953,"Created":0,"Updated":0,"Deleted":0,"Succeeded":388953,"Failed":0,"Workers":[{"Queued":0,"LastDuration":555000000},{"Queued":0,"LastDuration":534000000},{"Queued":0,"LastDuration":176000000},{"Queued":0,"LastDuration":1352000000}]}
Do you give cluster-name in elasticsearch.yml? Are there any other settings in elasticsearch.yml like thread_pool.bulk_size etc.?
Nope just running with this, both MongoDB and Elasticsearch on localhost in the VM.
monstache -direct-read-namespace test.test -stats
elasticsearch.yml is all commented out. The default settings for 7.0.0.
Are there any other settings in elasticsearch.yml like thread_pool.bulk_size etc.?
Not sure what it could be. If you are comfortable with Golang you can add a print statement or increment a counter at the following line in monstache.
https://github.com/rwynn/monstache/blob/rel6/monstache.go#L4182
With direct reads enabled that line should get hit for every doc in the collection in addition to any changes that you make to MongoDB.
If it is getting hit for every doc (~ 280k times in your case) then there is something wrong going on at the indexing step. If it isn’t getting a hit for every doc then the problem is reading from MongoDB.
Hi,
I am using: monstache --elasticsearch-url "http://localhost:9200" --mongo-url 'mongodb+srv://XXXX' --cluster-name cluster1 --direct-read-namespace db.collectionname
Indexing works only when a document in collection changes. I cant index whole collection when starting monstache. I did not set any workers btw, I dont know if it is necessary.