opencrvs / opencrvs-core

A global solution to civil registration
https://www.opencrvs.org
Other
79 stars 58 forks source link

Run ElasticSearch reindex everytime the migrations run #7240

Open naftis opened 3 days ago

naftis commented 3 days ago

1.6.0 is introducing a way to fully reindex ElasticSearch. Currently, the develop-branch handles reindexing via MongoDB's migrations. In other words, if the ES structure changes, a new MongoDB migration needs to be created to trigger the reindex.

The problem with this is that the same version might have multiple reindex-migrations when reindexing could be run only once. Also creating a reindex migration creates work that can be avoidable.

Improvement proposal

Run the ElasticSearch reindexing every time run-migrations.sh runs. The reindexing is fairly fast, and deployments are fairly scarce, so even if it sometimes runs without a distinctive need, it won't affect the deployments.

The risk is that when the database sizes grow, the reindexing can get slow. Per my testings, with ~1700 records it's ~6 seconds. That means 1 million records take 3529 seconds which is almost 1 hour. That is a long while but if done just a maximum of a few times a year, quite manageable.

If that becomes a problem, the reindexing should be able to be sharded to run for example 10 threads in parallel, as the records and ElasticSearch documents are self-contained.

Dev tasks