opencrvs / opencrvs-core

A global solution to civil registration
https://www.opencrvs.org
Other
85 stars 67 forks source link

MongoDB sharding for refactoring Hearth for a huge country scale. e.g. Nigeria #2434

Closed rikukissa closed 10 months ago

rikukissa commented 2 years ago

Summary

Sharding is a fairly complex and involved process that should be done as the last resort after all other optimisation methods are exhausted. Easy optimisations we haven't yet done are for instance adding MongoDB indices. Sharding affects the infrastructure, database collections and might even require changes to some of the MongoDB queries we do. As part of implementing sharding we need to analyse with what kind of filtering parameters each collection is queried so that we ensure all documents that are queried together always stay in just one bucket. Additionally, we also need to measure and monitor the data distribution between shards. Once you shard OpenCRVS you cannot un-shard OpenCRVS so this increases complexity considerably. From a core perspective even unsharded OpenCRVS releases need to be maintained as if they could be configured to be sharded. It is a major commitment.

Infrastructure

Required nodes:

Minimum infra setup: 5 nodes

Shard keys per collection

Find an appropriate sharding key

Generally speaking the chosen shard key per collection should reflect the most common queries we make to it. If for instance documents are often fetched based on the creator, the shard key should be the author id.

Indices for all shard keys must be created before collections are sharded.

Tasks

euanmillar commented 2 years ago

This is blocked until the Hearth replacement that includes indices is set up

rikukissa commented 10 months ago

@euanmillar ok if I close this ticket? I think it's not the right choice for us now or in the future