spotify / spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames
Apache License 2.0
155 stars 52 forks source link

Clear Cache from Select #26

Open samelamin opened 7 years ago

samelamin commented 7 years ago

is there a way to clear the cache from the select?

I have a job that is calling the select multiple times and i keep getting the old value

I need it because I am writing updates to a table and my source is S3, so i basically need to know when the last time the table was updated and I dont want to store state in the code

Is that possible?

nevillelyh commented 7 years ago

Cache logic is here: https://github.com/spotify/spark-bigquery/blob/master/src/main/scala/com/spotify/spark/bigquery/BigQueryClient.scala#L76

We could simply expose something that allows invalidation, or doing it automatically by looking at last modified time of queried tables, like in: https://github.com/spotify/scio/blob/master/scio-bigquery/src/main/scala/com/spotify/scio/bigquery/BigQueryClient.scala#L293