google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.
https://google.github.io/fhir-data-pipes/
Apache License 2.0
142 stars 82 forks source link

Support working with multiple FHIR servers #923

Open luisyng opened 6 months ago

luisyng commented 6 months ago

Hi guys. Thank you very much for your work in the project.

In our organization, we are maintaining several FHIR instances for different countries / projects and would like to have a single visualization platform (like Superset) where we have users with different roles that can only access the data for certain project(s).

The fhir-data-pipes could be a great tool to connect to several servers. I've heard that you have already discussed that, maybe you could already share some of the ideas / suggestions that could help us deal with multiple servers. Maybe a Wiki page.

I guess, something that should already work would be to have one instance of the pipes per FHIR server, so that we have one DWH and connection per server. But for that I guess we would have to create independent datasets, charts and dashboards, which wouldn't be optimal.

Another idea that comes to my mind is to do something like creating a single PostgreSQL database as a data warehouse and we have a column in each row where we store the project, that is filled in when the pipes are executed.

Or maybe you could add some new features in the library allowing working with multiple servers in a more native way.

I'm just brainstorming, probably you have better ideas to get to that. :)

Thanks!

bashir2 commented 5 months ago

Thanks @luisyng for filing this feature request; this is just to document what we discussed in the dev. call and emails: The obvious/current solution for this is to have one pipeline/controller per FHIR source which will generate completely separate DWHs. To implement a feature to read from multiple FHIR sources in the same pipeline, we should first clearly define how we are going to differentiate between the generated data in the downstream queries. One idea is to tag the resources, for example in their meta fields, as suggested in #886, but of course that is not the only way.

Also there should probably be some access-control constraints when querying the single DWH (containing data from all FHIR servers) and those need to be clarified too.