spotify / scio

A Scala API for Apache Beam and Google Cloud Dataflow.
https://spotify.github.io/scio
Apache License 2.0
2.55k stars 514 forks source link

Consider using google-cloud-bigquery library instead of google-api-services-bigquery #1555

Open clairemcginty opened 5 years ago

clairemcginty commented 5 years ago

Google documentation recommends using the client library google-cloud-bigquery rather than the API library google-api-services-bigquery.

Pros

Cons

clairemcginty commented 5 years ago

Update: the client library bug affecting extract jobs has been fixed! https://github.com/googleapis/google-cloud-java/issues/3924

nevillelyh commented 4 years ago

@ClaireMcGinty is this still worth looking?

nevillelyh commented 4 years ago

Talked IRL, closing.

regadas commented 4 years ago

I would like us to reconsider re-opening this. I think there's still some subtle bugs in our current internal BigQuery client. Some of these bugs are related to not fallbacking to env setting properties.

regadas commented 4 years ago

@nevillelyh @ClaireMcGinty what was the reason to not go forward with this?

clairemcginty commented 4 years ago

@regadas If I remember right, it was due to the complexity of integrating with Beam's BigQuery sources/sinks -- Beam returned types from google-api-services-bigquery and a lot of the Google library functions that could convert those to google-cloud-bigquery types were private.

This was awhile ago though, so maybe worth a second look?

regadas commented 4 years ago

@ClaireMcGinty interesting! I think it's worth looking into it again since we are already using the storage impl to actually retrieve data.

Let's see if the other types are good to go as well. I'll book some time to look into this.

Thanks