rwynn / monstache

a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
https://rwynn.github.io/monstache-site/
MIT License
1.28k stars 181 forks source link

Index Lifecycle Management support #500

Open kennymalac opened 3 years ago

kennymalac commented 3 years ago

Hello, I had a question/request concerning Elasticsearch's lifecycle management support. https://github.com/AlibabaCloudDocs/elasticsearch/blob/master/intl.en-US/Best%20Practices/Elasticsearch%20applications/Index%20management/Use%20ILM%20to%20separate%20hot%20data%20from%20cold%20data.md

Let's say I create a collection in mongodb with "-000001" at the end, and continue to append to this table. At a certain point, Elastic rolls it over into -000002 due to ILM policies. If I update a document in mongodb, but that document is in -000002, will monstache correctly update the document in Elasticsearch?

It appears not, as here it uses the index name in update command and is updating by document id: https://github.com/rwynn/monstache/blob/rel6/monstache.go#L3036

If the mongodb collection is -000001, but the elasticsearch index is -000002, then this update will fail.

So, this appears to be problematic, as it would mean that my ingest of data into my mongo collections would have to be in sync with the ILM policy of my elasticsearch. Is there any way that this could be integrated better?

rwynn commented 3 years ago

hey @kennymalac I'm not sure there is any issue here. Monstache can be configured to write to the logical index (can be controlled via a mapping). By logical index I mean an index alias as stated in the documents you referenced (7 - Use the index alias to write data.)

[[mapping]]
namespace = "mongodb.mongocl"
index = "index_write_alias"

By writing to an index alias the actual index backing this alias is in Elasticseach able to change over time independently of changes to Monstache. Is there anything I'm missing?

rwynn commented 3 years ago

Also, monstache will do an upsert by default so if you have a index alias that first points to index-01 and later points to index-02, monstache would not update the document back in index-01 after the rollover has take place. It would insert a new document into index-02. So you would need to account for this if you are doing rolling indexes with an index alias for writes.

E.g. if you had a logical read alias that spanned the last 3 physical indexes, you could get the latest version of document by limiting your query results to 1 after sorting by update timestamp desc. Since all 3 physical indexes may contain versions of the document at various stages of its lifecycle.