opencrvs / opencrvs-core

A global solution to civil registration
https://www.opencrvs.org
Other
85 stars 64 forks source link

Data storage redesign for reliability & scalability #3704

Open euanmillar opened 2 years ago

euanmillar commented 2 years ago

This ticket is an epic to design and implement a standardised and scalable storage solution for our primary data. The changes will be fundamental to the product but are critical for any production use of the product.

More details coming later..


We have forked Hearth and maintain it here.

Hearth is no longer officially maintained by Jembi Health Systems because they have not been able to secure enough funding to do so.

Apart from Hearth, there is no OpenSource NoSQL, and in our opinion, scalable FHIR server that suits our needs. We do not recommend the use of HAPI-FHIR at scale because PostgreSQL cannot be horizontally scaled. It can only be vertically scaled. This has a cost implication to implementing governments.

Interesting reading and applies to hapi-fhir: https://vneilley.medium.com/most-fhir-servers-are-unusable-in-production-8833cb1480b1

We have discovered these projects ...

https://github.com/icanbwell/fhir-server https://github.com/bluehalo/node-fhir-server-mongo https://github.com/bluehalo/node-fhir-server-core https://github.com/Chinlinlee/Burni

However, only 2 of these are actively maintained projects and only 1 of them is maintained by a recognisable entity.

We are not suggesting we use any of these for the following reasons:

1) we do not want to be dependent upon a small, private organisation or private individual to maintain our database server that is not a core partner. We would need to perform a thorough analysis of their longevity and formalise a relationship that we would have with such an organisation. None of these orgs are comparable to a MongoDB, Elastic, Minio or InfluxData where we can depend on their size in the market.
2) only one of them shows us a validated schema that can be sharded (horizontally scaled) but that only supports 1 FHIR document type - The Patient type.
3) all seem to follow a Hearth-like SQL schema approach to NoSQL data storing. We want to combine some FHIR document types into a single schema for faster performance to utilise the true power of NoSQL.

However, there are aspects of these projects which along with Hearth can inspire us as we plan our own approach for best cost & performance in the civil registration context.

In OpenCRVS v1.0 we made our own fork of Jembi's Hearth and maintain security upgrades.

At some point in the future, we plan to gradually phase out FHIR completely from our storage, starting with registration collections, e.g.: Observation Encounter etc and instead expose a FHIR API.

Benefits are:

Long-term strategy:

What we plan to perform when gradually phasing out Hearth:

  1. We want to maintain all the existing routing request URL query param fhir API functionality
  2. We should not have a separate Composition, Patient, RelatedPerson, Task, Observation, DocumentResource and Encounter collection, but instead a single Registration collection that contains a simplified version of a fhir.Bundle of all of the above.
  3. We should maintain separate Location, Practitioner, PractitionerRole collections.
  4. The Registration collection should make use of Mongo indices so that it can be sharded and horizontally scaled. Maybe the Registration collection needs to exist in an entirely different database or volume?
  5. We want all the code to be TypeScript
  6. Hearth has a customisable "plugin" architecture. So that it can be extended by other organisations using it as an OpenSource lib. We have no intention of extending beyond the plugins we need, so we can deprecate this as long as we include the checkDuplicateTask functionality somehow.
  7. It should be protected by the same JWT auth as the rest of our app and thus be penetration tested. Only Location GET endpoints should be auth:false
  8. We should maintain the existing FHIR validation functionality from Hearth
rikukissa commented 2 months ago

Related https://github.com/opencrvs/opencrvs-core/issues/7120

rikukissa commented 2 months ago

Related https://github.com/opencrvs/opencrvs-core/issues/6372 as needs to be taken into consideration when designing a new data model