samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

Pyspark complete code #41

Closed 87sanchavan closed 7 years ago

87sanchavan commented 7 years ago

Hi Sam, This is amazing library that you built however I was not able to use in python. Can you please share complete code/example of how to use this from python/pyspark.

Regards Sanjay

samelamin commented 7 years ago

Hi Sanjay

Below is a sample of reading from BQ

BQ_PROJECT_ID = "projectId"
DATASET_ID = "datasetId"
jsonFile = "/path/to/json"
GcsBucket = "gcs-bucket"
session = SparkSession.builder.getOrCreate()
bq = session._sc._jvm.com.samelamin.spark.bigquery.BigQuerySQLContext(session._wrapped._jsqlContext)
bq.setGcpJsonKeyFile(jsonFile)
bq.setBigQueryProjectId(BQ_PROJECT_ID)
bq.setGSProjectId(BQ_PROJECT_ID)
bq.setBigQueryGcsBucket(GcsBucket)
bq.setBigQueryDatasetLocation("US")
tableName = "{0}:{1}.{2}".format(BQ_PROJECT_ID,DATASET_ID,TABLE_NAME)
bqDF = session._sc._jvm.com.samelamin.spark.bigquery.BigQueryDataFrame(df._jdf)
bqDF.saveAsBigQueryTable(tableName, False, 0,None,None)