google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.
https://google.github.io/fhir-data-pipes/
Apache License 2.0
141 stars 80 forks source link

Experiment with DuckDB and compare with single-node Spark #987

Closed bashir2 closed 1 month ago

bashir2 commented 3 months ago

Multiple partners have told us that they want to query Parquet files with DuckDB. Using different query engines that understand Parquet was one of the original design principles of our pipelines; so this is definitely doable. But it is probably worth if we also have an opinion about this that is backed by real query data. For example, having some side-by-side comparison with our example single-node Spark deployment option.

If we find this an appealing option, we should probably provide flat views in DuckDB dialect as well.