rwynn / monstache

a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
https://rwynn.github.io/monstache-site/
MIT License
1.28k stars 180 forks source link

How to slow down MongoDB events? #331

Open exentrich opened 4 years ago

exentrich commented 4 years ago

Hello! Huge thanks for such amazing tool! Especially for supporting custom plugins, this gives endless possibilities for customization!

Everything works great, so now I'm searching ways to optimize performance. I'm curious does it possible to slow down events stream from MongoDB? I have many similar changes for same documents, for example when user change title field, my API generates many update events, for every key press. Does it possible to ignore such intermediate events and catch only recent? Somehow to configure time interval or something else?

This feature especially important for me, because my Golang plugin do language detection. When such events happen thousand times per second, my server sometime struggle. I'm almost okay with slight delay!

rwynn commented 4 years ago

Hi, I can think of only a couple of things you can do. First, if you frequently update title field but then only update e.g. updateTimestamp field when a save button is clicked then you could take advantage of a filter on the change stream. This would limit the amount of data read from MongoDB.

Another idea would be to use a Process function in plugin. In this case you would still be reading all change events. But in Process you could simply write to a go channel with a buffer of e.g. 1000 changes. Another go routine could be started in the init function of the plugin which selects on 2 channels: the previous mentioned channel and another timer channel in an infinite loop. If the select fires on the data channel the go routine could upsert into a cache of windowed data. If the select fires on the timer, the go routine could flush the cache of data to Elasticsearch using a bulk request. This cache is basically a window of data for the duration of the timer. This would only limit the amount of data sent to Elasticsearch though, not the amount read from MongoDB.