Question: How to run the `pipelines/controller` for sync-ing a FHIR Server to another FHIR Server? I saw that it is possible in the `pipelines/batch`

google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.

https://google.github.io/fhir-data-pipes/

Apache License 2.0

154 stars 86 forks source link

Question: How to run the `pipelines/controller` for sync-ing a FHIR Server to another FHIR Server? I saw that it is possible in the `pipelines/batch` #825

Closed muhammad-levi closed 5 months ago

muhammad-levi commented 1 year ago

https://github.com/google/fhir-data-pipes/wiki/Get-Started-with-FHIR-Data-Pipes-Pipelines#hapi-fhir-jdbc-to-a-fhir-server That is the example command to execute for instructing the nonexecutable JAR batch-bundled to sync data from a FHIR Server using FHIR Search API to another FHIR server, right?

In the case of pipelines/controller, I cannot find for example fhirSinkPath config property in the application config.

bashir2 commented 1 year ago

Sorry for the delay. Yes the feature for syncing from one FHIR server to another is not added in the controller. It is not too much work to add but it is also not trivial. We have not prioritized it since our partners/users did not have an immediate use-case but we certainly can if needed.

muhammad-levi commented 11 months ago

I see, thank you. I was and still experimenting around federated architecture of FHIR servers, and currently trying to make use the fhir-data-pipes capability of syncing from one FHIR server to another

Specifically:

Sync FHIR servers of district-level organizational entities to the central FHIR server in the district
Sync the central FHIR server in the district to the central FHIR server in the nation, for example.

https://docs.google.com/presentation/d/1U0ypgBSO9JVe_7aJzM-mliEEVE5OL3AJTCRwQ9uWPCY/edit#slide=id.g297fea6dee9_0_2

muhammad-levi commented 11 months ago

https://github.com/GoogleCloudPlatform/functions-framework-java/issues/254

I am trying to run pipelines/batch uber JAR in Google Cloud Functions, I am wondering maybe need some modifications first somewhere before using the uber JAR in Google Cloud Functions?

Also, is it good enough to make use of the uber JAR as a Google Cloud Functions? Was trying to mimic the pipelines/controller by using Google Cloud Functions, Cloud Scheduler, Cloud Pub/Sub... But only scheduled runs, no control panel (or the control panel is Google Cloud console)...

Sample usages of the uber JAR were:

PATH_TO_THE_UBER_JAR=/d/SourceCodes/google/fhir-data-pipes/pipelines/batch/target/batch-bundled.jar && \
java -jar ${PATH_TO_THE_UBER_JAR} \
    --fhirServerUrl=http://fhir-server-ministry-of-health:8091/fhir \
    --resourceList=Patient,Practitioner \
    --fhirSinkPath=http://fhir-server-central:8098/fhir

PATH_TO_THE_UBER_JAR=/d/SourceCodes/google/fhir-data-pipes/pipelines/batch/target/batch-bundled.jar && \
java -jar ${PATH_TO_THE_UBER_JAR} \
    --fhirServerUrl=http://fhir-server-bkkbn:8092/fhir \
    --resourceList=Patient,Practitioner \
    --fhirSinkPath=http://fhir-server-central:8098/fhir

muhammad-levi commented 11 months ago

Maybe Cloud Run fits better...

bashir2 commented 11 months ago

Thanks @muhammad-levi for the updates. For the federation scenario, it is probably best to try to add it to the controller and incremental pipelines as well. Otherwise you need to manage incremental updates yourself. A few notes about your comments/experiments:

In general, if you are trying the pipeline for production use, please consider using FlinkRunner as described here.
Running the pipeline on Cloud Run or Cloud Functions is probably not a great idea. The pipelines are designed to process large batches of data and their start-up time and size may not be ideal for environments like Cloud Run and Cloud Functions. We might be able to optimize the code for such scenarios though but that has not been a focus so far.

muhammad-levi commented 11 months ago

Thank you @bashir2 for the kind explanation. I see, I blindly thought the "incremental sync" behaviour is configurable in the pipelines/batch as arguments or something. So if I understand correctly from your explanation, "incremental sync" is one of the features of pipelines/controller?

Does pipelines/controller allows scheduled runs?

bashir2 commented 11 months ago

Basically what the controller does is to automate the process of running the pipelines (including the incremental ones). So you can manually run the "batch" pipeline in the incremental mode using the --since feature and then run the Merger pipeline. The controller take care of these details and error handling.

bashir2 commented 8 months ago

@mozzy11 and @muhammad-levi, I thought it is best to continue the discussion re. this comment here in the issue (instead of the PR) for better visibility. Can you both please provide your input? I want to better understand your FHIR-to-FHIR sync scenario and whether you care about the Parquet files in the DWH or not.

BTW, I have also added this to the agenda for the dev. call tomorrow; it would be nice if you can attend too.

mozzy11 commented 8 months ago

Thanks @bashir2 . In our case 1). we want to Sync from facility-level point-of-care systems ie OpenMRS , OpenELIS to a centralized FHIR store . in this case we don't need Parquet files .

We can then get data from the centralized FHIR store to a Parquest DWH .

in general a FHIR sink and Parquet DWH should all be optional

muhammad-levi commented 8 months ago

@bashir2 Apologies for not noticing this earlier, I have created another diagram to depict more in-depth communication of fhir-data-pipes with the other components in the Federated FHIR Ecosystem

In that FHIR-to-FHIR sync scenario, the fhir-data-pipes collect the resources from all participants in the ecosystem (might be relevant with https://github.com/google/fhir-data-pipes/issues/923) and put them into the "central" HAPI FHIR which has the MDM (Master Data Management) module enabled and a good enough MDM rules installed to reconcile the health data (FHIR). For that part of the process, we do not care about the Parquet files in the DWH.

Later, similar with @mozzy11 scenario, possibly fhir-data-pipes can also ETL the data from the "Central" HAPI FHIR into a Parquet DWH.