Sometime loading avro files to a BigQuery table fails, since a _temporary directory on GCS doesn't exist.
16/12/02 23:46:47 INFO com.spotify.spark.bigquery.BigQueryClient: Loading gs://spark-helper-us-region/hadoop/tmp/spark-bigquery/spark-bigquery-1480722354001=863039906 into sage-shard-740:analytics_us.activities_20160903
Exception in thread "main" java.io.IOException: Not found: Uri gs://spark-helper-us-region/hadoop/tmp/spark-bigquery/spark-bigquery-1480722354001=863039906/_temporary/0/_temporary/attempt_201612022346_0113_m_000065_0/part-r-00065-0f64344d-0f7e-4677-a28b-56e79a287e41.avro
at com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion(BigQueryUtils.java:95)
at com.spotify.spark.bigquery.BigQueryClient.com$spotify$spark$bigquery$BigQueryClient$$waitForJob(BigQueryClient.scala:134)
at com.spotify.spark.bigquery.BigQueryClient.load(BigQueryClient.scala:130)
at com.spotify.spark.bigquery.package$BigQueryDataFrame.saveAsBigQueryTable(package.scala:150)
at com.spotify.spark.bigquery.package$BigQueryDataFrame.saveAsBigQueryTable(package.scala:159)
at com.mercari.spark.sql.SparkBigQueryHelper$.saveBigQueryTableByDataFrame(SparkBigQueryHelper.scala:229)
at com.mercari.spark.sql.SparkBigQueryHelper.saveBigQueryTableByDataFrame(SparkBigQueryHelper.scala:66)
at com.mercari.spark.batch.ActivitiesTableCreator$.apply(ActivitiesTableCreator.scala:226)
at com.mercari.spark.batch.ActivitiesTableCreator$.main(ActivitiesTableCreator.scala:210)
at com.mercari.spark.batch.ActivitiesTableCreator.main(ActivitiesTableCreator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Sometime loading avro files to a BigQuery table fails, since a
_temporary
directory on GCS doesn't exist.