Closed darylerwin closed 7 years ago
Hi I havent used it on dataproc but i have on AWS, and databricks and locally ofcourse
You shouldnt need to set the json file because the application should pick it up from the underlying google engine
You can build using the build.sbt file provided. Give me a shout if you face any issue building it
Thanks. Still learning. I was able to use the spotify package for 1.1 Dataproc, 1.2 (Spark 2.2) will not work. It appears your package will run with Spark 2.2. (yes?) This runs clean for reads but fails on writes. spark-shell --packages com.github.samelamin:spark-bigquery_2.11:0.2.2 ... table.saveAsBigQueryTable("bigdata:data_analytics_poc.test_write_table1")
java.util.NoSuchElementException: mapred.bq.gcs.bucket
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1089)
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1089)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1089)
at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74)
at com.samelamin.spark.bigquery.BigQueryDataFrame.writeDFToGoogleStorage(BigQueryDataFrame.scala:60)
at com.samelamin.spark.bigquery.BigQueryDataFrame.saveAsBigQueryTable(BigQueryDataFrame.scala:42)
... 50 elided
no worries we all are!
yes it should but you need to set the bucket name,
// Set up BigQuery project and bucket
sqlContext.setBigQueryProjectId("<BILLING_PROJECT>")
sqlContext.setBigQueryGcsBucket("<GCS_BUCKET>")
// Set up BigQuery dataset location, default is US
sqlContext.setBigQueryDatasetLocation("<DATASET_LOCATION>")
That worked -- didnt realize that was necessary since the saveas.. appears to have all the needed pieces. Is there a way to see all the methods(?) for sqlContext for example .. to know all the various setXXX options?
Sadly no. It's only me working on documentation and I'm clearly not great at it
If you are using intellij you can try and get some context from the autocomple
Failing that then have a look at the code. It's all in BigQuerySqlContext
Feel free to send a pr on docs if you are interested
On Sun, 10 Sep 2017 at 04:00, Daryl Erwin notifications@github.com wrote:
That worked -- didnt realize that was necessary since the saveas.. appears to have all the needed pieces. Is there a way to see all the methods(?) for sqlContext for example .. to know all the various setXXX options?
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/43#issuecomment-328316365, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLmzkld9yJW0-QhTTyARs2GeoCKSnCks5sg1DEgaJpZM4PSCus .
Have you successfully run this on the Dataproc Platform? Can you provide a working bulid.sbt that you use to compile your scala program.