google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.
https://google.github.io/fhir-data-pipes/
Apache License 2.0
151 stars 84 forks source link

added views to the controller and a fast recreate option #929

Closed bashir2 closed 7 months ago

bashir2 commented 8 months ago

Description of what I changed

This is the last piece that fixes #916. Beside adding the incremental view update option to the controller, this also implements a fast view recreation option. This is done by reading the whole DWH snapshot (the Parquet files), convert them back to HAPI objects, and apply the new view logic on them. This can be an order of magnitude (or more) faster than refetching resources from the FHIR server. For a fairly large DWH (which originally took ~200 minutes to create by reading from the FHIR server), recreating views only took ~13 to 18 minutes (depending on old views size/presence). From this time, less than 2 minutes was actually converting Avro to HAPI.

Note we never used the Avro to HAPI feature of Bunsen in our production code; this first usage uncovered some new bugs which are fixed in this PR as well.

E2E test

TESTED:

Ran the controller and tested the FULL and VIEWS modes. See above for performance testing.

Checklist: I completed these to help reviewers :)

chandrashekar-s commented 8 months ago

Thanks @bashir2 for the changes. This is a great milestone to be completed. I am awaiting for the adoption of this feature and see the benefits of it.