samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

Utilize Bigquery Storage API #71

Open smith-m opened 5 years ago

smith-m commented 5 years ago

Beta officially announced today, there is the opportunity to leverage the bigquery storage api for reading tables from bq. In theory it should have lower latency than gcs dumps and also be able to leverage predicate pushdowns and column projection while also being avro based.

Are there any plans to integrate the storage all with this or another spark dataframe project?

smith-m commented 5 years ago

cloud.google.com/bigquery/docs/reference/storage/

samelamin commented 5 years ago

This is really interesting, thanks for sharing.

Yeah it'll need a separate branch while it's on beta but certainly worth looking into

Or better yet utilising it via an option but using gcs dumps by default

I'll have a look over the coming weeks