Open chandra2037 opened 4 years ago
Hi, what version of monstache are you using?
I think if not already you should use a different resume-name
for each monstache process as they should not share resume state.
Are you able to use direct reads to do a full sync to get a matching count?
Thank you for your response @rwynn
We were using 6.5.4, but recently upgraded to the latest 6.7.0 and still having the same issue.
Sorry forgot t mention that indeed using a different resume-name
for each instance, I can see two records in the monstache.monstache
collection.
Able to do the full documents sync by running Monstache with direct-read-namespaces
config. But syncing from oplog is crucial to our systems, any help will be appreciated.
I think it would be difficult to diagnose this one without a script to reproduce it. Have you experimented with some batch inserts to MongoDB to see if that is causing a problem?
I have yet to create a script to reproduce the error. But in the meantime for some reason, I am seeing the following messages in the logs
ERROR 2020/11/19 06:21:40 elastic: bulk processor "monstache" failed: elastic: Error 400 (Bad Request): Validation Failed: 1: id is missing; [type=action_request_validation_exception]
TRACE 2020/11/19 06:21:39 HTTP/1.1 400 Bad Request
Content-Length: 227
Content-Type: application/json; charset=UTF-8
{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: id is missing;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: id is missing;"},"status":400}
And when I restarted the Monstache instance Kubernetes Pod, these errors disappeared. Trying to find the root cause for these errors, reviewed the documents, there is nothing unusual about the data as the same documents getting published okay after the restart.
Can you please advise on why and when these errors will occur?
Update on the issue:
id is missing
issue, seems like there are some records in the Mongo whose id value is blank and whenever Monstache tries to sync this record (including trace log below) to Elasticsearch, it is failing. {"index":{"_index":"testdb.testcollection","version":6898720532628242498,"version_type":"external"}}
{"deleted":false,"id":"","oplog_date":"2020-11-24T15:59:02Z","oplog_ts":{"T":1606233542,"I":66}}
Questions:
hi @chandra2037
400
should not be a retryable error for monstache and should get dropped after Elasticsearch respondsThank you @rwynn
I will look into the filter or transforms
But for some reason Monstache is retrying even for 400 errors, screenshot of errors below
This will go away only if I restart the Monstache.
Yeah, it looks like the bulk request overall is failing not the individual items. Do you know why the Elasticsearch ID would be empty in the line item? This should be coming from the string form of the _id
field in MongoDB which I didn't think could be empty? That is the assumption is that every MongoDB document has an _id
either user generated or auto generated.
Like you mentioned it does look like the golang client Monstache is using is looking at the response code for the entire bulk request, mapping it to an error in this case, and never clearing the bulk Items after all the retries have been exhausted.
Maybe MongoDB allows an empty string to be the _id
value of a document as long as it is unique for the collection? That one document might be causing this?
@chandra2037 FYI I just pushed a commit to check for empty _id
and report an error instead of attempting to index/delete the document.
Maybe MongoDB allows an empty string to be the
_id
value of a document as long as it is unique for the collection? That one document might be causing this?
Good point, this might be the case.
@chandra2037 FYI I just pushed a commit to check for empty
_id
and report an error instead of attempting to index/delete the document.
Thank you @rwynn, we are using docker version of Monstache. Can you please advise on how to get this change?
Hi @chandra2037 can you try with version 6.7.2
? The change is included.
Thank you @rwynn. Really appreciate your quick responses on this issue. I will try the new version.
We have two Monstache instances deployed in EKS cluster. Each instance is independently deployed.
Instance 1 - Monstache configuration:
Instance 2 - Monstache configuration: Same as instance 1 config except instead of namespace-exclude-regex configured the following
namespace-regex = '^.*DB\.(classification|collection|publication).*$'
The idea is - Instance 2 will index the documents from configured collections and Instance 1 will index from the rest of the collections.
Everything works fine for a while, but after that, we are seeing discrepancies between the MongoDB collection document count vs Elastic Index count.
Notes:
Can you please advise?
Thank you for your help