Open muhammad-levi opened 1 week ago
Actually this feature is the long standing issue #455, i.e., adding BigQuery as a sink option. It should not be too hard to add this and I think it is a useful feature. The main reason we have not implemented it yet is that we have not heard much demand for it from our partners. If this is a useful feature for you and you can contribute for implementing it, I am willing to help.
Side note 1: We have actually done some work in #454 to make the resulting schema similar to the BigQuery schema of GCP FHIR store -> BigQuery
flow.
Side note 2: You can import Parquet files into BigQuery; that's how the comparisons in #454 was done.
@bashir2 I see. Initially I was thinking of using the JDBC driver for BigQuery and try and create a sample JDBC URL config for BigQuery in the DatabaseConfiguration https://github.com/google/fhir-data-pipes/blob/dc70755848b2ea83390a2699ed05ed6088875eec/pipelines/common/src/main/java/com/google/fhir/analytics/model/DatabaseConfiguration.java#L58-L62
and then make use of the sinkDbConfigPath
config property.
https://github.com/google/fhir-data-pipes/blob/dc70755848b2ea83390a2699ed05ed6088875eec/pipelines/controller/config/application.yaml#L168-L173
@muhammad-levi your JDBC based idea can work but since we use Beam for our pipeline, I would first consider BigQueryIO; it is usually better to rely on Beam IOs when it is possible. That said, there are reasons not to use them; for example, in some places, we don't use ParquetIO for creating Parquet files (mostly because of Flink's memory overhead in the single-machine mode).
Instead of:
fhir-data-pipes
-> Google Healthcare API FHIR Store -> Google BigQueryIt will be like:
fhir-data-pipes
-> Google BigQueryAs also suggested in this diagram
"Data Loaders" includes
fhir-data-pipes
.