In November of last year my colleague, mayankshah891 raised an issue (#48). We are tying to import data from a BigQuery table where data is streaming into it and we randomly have errors like described in the previous issue.
I have noticed that printing the (almost) full table after caching it helps:
val table = sqlContext.bigQueryTable("bigqueryprojectid:blabla.name_table").cache()
table.show(100000)
It probably forces spark to persist the table and no more connection with BigQuery is required.
We closed issue #48 after mentioning that data was streaming into our big query table. That seemed to explain our problem. It seems that new data is coming in quite frequently (at least every 5 min).
Could you confirm that this is what causes our problem and do you have a more scientific way of getting around it?
Thank you so much for your help. Truly appreciated.
Hi,
In November of last year my colleague, mayankshah891 raised an issue (#48). We are tying to import data from a BigQuery table where data is streaming into it and we randomly have errors like described in the previous issue.
I have noticed that printing the (almost) full table after caching it helps:
val table = sqlContext.bigQueryTable("bigqueryprojectid:blabla.name_table").cache() table.show(100000)
It probably forces spark to persist the table and no more connection with BigQuery is required.
We closed issue #48 after mentioning that data was streaming into our big query table. That seemed to explain our problem. It seems that new data is coming in quite frequently (at least every 5 min).
Could you confirm that this is what causes our problem and do you have a more scientific way of getting around it?
Thank you so much for your help. Truly appreciated.