I'm trying to process some data from BigQuery using the local cluster and then write it in hdfs. I keep getting the following error :
java.lang.IllegalStateException: Export FS must derive from GoogleHadoopFileSystemBase. at com.google.common.base.Preconditions.checkState(Preconditions.java:456) at com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration.getTemporaryPathRoot(BigQueryConfiguration.java:363) at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:126) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:130) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1343) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.take(RDD.scala:1337) at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1378) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.first(RDD.scala:1377) at com.samelamin.spark.bigquery.BigQuerySQLContext.bigQuerySelect(BigQuerySQLContext.scala:96)
Hello,
I'm trying to process some data from BigQuery using the local cluster and then write it in hdfs. I keep getting the following error :
java.lang.IllegalStateException: Export FS must derive from GoogleHadoopFileSystemBase. at com.google.common.base.Preconditions.checkState(Preconditions.java:456) at com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration.getTemporaryPathRoot(BigQueryConfiguration.java:363) at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:126) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:130) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1343) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.take(RDD.scala:1337) at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1378) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.first(RDD.scala:1377) at com.samelamin.spark.bigquery.BigQuerySQLContext.bigQuerySelect(BigQuerySQLContext.scala:96)
Sample code :
` spark.sqlContext.setBigQueryProjectId("XXXX") spark.sqlContext.setBigQueryDatasetLocation("EU") spark.sqlContext.setBigQueryGcsBucket("XXXXX") spark.sqlContext.useStandardSQLDialect(true)
`
Thank you for your help.