rwynn / monstache

a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
https://rwynn.github.io/monstache-site/
MIT License
1.28k stars 181 forks source link

Oplog tailing does not handle data modifications done in a transaction #514

Open sameerkattel opened 3 years ago

sameerkattel commented 3 years ago

Data modifications done in a transaction is logged with

"op" : "c", "ns" : "admin.$cmd",

And data are not synced to ES for modifications done in a transaction

yoitsro commented 3 years ago

Hey @rwynn! Any pointers on this?

sameerkattel commented 3 years ago

@yoitsro As far as I can tell, https://github.com/rwynn/gtm/blob/e02a1f9c1b79eb5f14ed26c86a23b920589d84c9/gtm.go#L910 does not parse "op" : "c", "ns" : "admin.$cmd", which has all transactions related modifications inside

cc: @rwynn

yoitsro commented 3 years ago

I wonder if there's a final set of events which Mongo fires off after a transaction has been committed successfully.

I would imagine that if a transaction fails, we don't want to be pushing that data into Elasticsearch.

rwynn commented 3 years ago

hi @yoitsro does this also affect change streams or only the direct oplog tailing?

sameerkattel commented 3 years ago

@yoitsro Only when transaction is committed then entries are recorded in oplog.

@rwynn this only affects oplog tailing.

yoitsro commented 3 years ago

We use change streams.

Our use case basically ends up creating the document and then updating it as part of the same transaction.

It's strange though because in our automated tests, we don't see any problems, but only once it goes out to any of our test environments and people are using it do we see these problems. I'll try to increase the log levels in monstache to understand the situation some more, but not sure when that'll be.

sameerkattel commented 3 years ago

@yoitsro wondering if you have HA/replication with multinode setup. And changestream only works when data is committed to majority of data bearing nodes https://docs.mongodb.com/manual/changeStreams/#event-notification.

yoitsro commented 3 years ago

Yep, HA/replication is setup. The data does get persisted to mongo and I would expect it to eventually show up in Elasticsearch, but it never does, which is the strangest thing about this.