opencrvs / opencrvs-core

A global solution to civil registration
79 stars 58 forks source link

Run ElasticSearch reindex everytime the migrations run #7240

Open naftis opened 3 days ago

naftis commented 3 days ago

1.6.0 is introducing a way to fully reindex ElasticSearch. Currently, the develop-branch handles reindexing via MongoDB's migrations. In other words, if the ES structure changes, a new MongoDB migration needs to be created to trigger the reindex.

The problem with this is that the same version might have multiple reindex-migrations when reindexing could be run only once. Also creating a reindex migration creates work that can be avoidable.

Improvement proposal

Run the ElasticSearch reindexing every time runs. The reindexing is fairly fast, and deployments are fairly scarce, so even if it sometimes runs without a distinctive need, it won't affect the deployments.

The risk is that when the database sizes grow, the reindexing can get slow. Per my testings, with ~1700 records it's ~6 seconds. That means 1 million records take 3529 seconds which is almost 1 hour. That is a long while but if done just a maximum of a few times a year, quite manageable.

If that becomes a problem, the reindexing should be able to be sharded to run for example 10 threads in parallel, as the records and ElasticSearch documents are self-contained.

Dev tasks