samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

Ability to use this on DataProc Google Platform #43

Closed darylerwin closed 7 years ago

darylerwin commented 7 years ago

Have you successfully run this on the Dataproc Platform? Can you provide a working bulid.sbt that you use to compile your scala program.

samelamin commented 7 years ago

Hi I havent used it on dataproc but i have on AWS, and databricks and locally ofcourse

You shouldnt need to set the json file because the application should pick it up from the underlying google engine

You can build using the build.sbt file provided. Give me a shout if you face any issue building it

darylerwin commented 7 years ago

Thanks. Still learning. I was able to use the spotify package for 1.1 Dataproc, 1.2 (Spark 2.2) will not work. It appears your package will run with Spark 2.2. (yes?) This runs clean for reads but fails on writes. spark-shell --packages com.github.samelamin:spark-bigquery_2.11:0.2.2 ... table.saveAsBigQueryTable("bigdata:data_analytics_poc.test_write_table1")

java.util.NoSuchElementException: mapred.bq.gcs.bucket
  at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1089)
  at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1089)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1089)
  at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74)
  at com.samelamin.spark.bigquery.BigQueryDataFrame.writeDFToGoogleStorage(BigQueryDataFrame.scala:60)
  at com.samelamin.spark.bigquery.BigQueryDataFrame.saveAsBigQueryTable(BigQueryDataFrame.scala:42)
  ... 50 elided
samelamin commented 7 years ago

no worries we all are!

yes it should but you need to set the bucket name,

// Set up BigQuery project and bucket
sqlContext.setBigQueryProjectId("<BILLING_PROJECT>")
sqlContext.setBigQueryGcsBucket("<GCS_BUCKET>")

// Set up BigQuery dataset location, default is US
sqlContext.setBigQueryDatasetLocation("<DATASET_LOCATION>")
darylerwin commented 7 years ago

That worked -- didnt realize that was necessary since the saveas.. appears to have all the needed pieces. Is there a way to see all the methods(?) for sqlContext for example .. to know all the various setXXX options?

samelamin commented 7 years ago

Sadly no. It's only me working on documentation and I'm clearly not great at it

If you are using intellij you can try and get some context from the autocomple

Failing that then have a look at the code. It's all in BigQuerySqlContext

Feel free to send a pr on docs if you are interested

On Sun, 10 Sep 2017 at 04:00, Daryl Erwin notifications@github.com wrote:

That worked -- didnt realize that was necessary since the saveas.. appears to have all the needed pieces. Is there a way to see all the methods(?) for sqlContext for example .. to know all the various setXXX options?

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/43#issuecomment-328316365, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLmzkld9yJW0-QhTTyARs2GeoCKSnCks5sg1DEgaJpZM4PSCus .