google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.
https://google.github.io/fhir-data-pipes/
Apache License 2.0
141 stars 80 forks source link

Create a realistic test env to show scaling capabilities of the pipeline #967

Open bashir2 opened 4 months ago

bashir2 commented 4 months ago

The main reason that our pipelines are implemented using Apache Beam is to make sure they are horizontally scalable and able to process large input FHIR data in a short time. We have shown this scalability feature with JSON input files (on a distributed file system) but a more realistic scenario is to have a FHIR server backed by a database with multiple replicas. This issue is to create and test the following two scenarios:

The data for above cases can come from the Synthea-HIV module. The test env. should be easy/quick to deploy; i.e., we should save the DB snapshot such that it can quickly be deployed whenever needed. We will run the pipelines on Dataflow service of Google Cloud and the DB should be on Cloud SQL (with enough replicas enabled). So part of this issue is to create a test env on GCP with a replicated HAPI server and DB replicas backing it.

This can also be used as a test bed for the Bulk Export API once we are done with its implementation (#533 is related).

jakubadamek commented 4 months ago

I will take a look