PySpark: Method saveAsBigQueryTable does not exist

samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

Apache License 2.0

70 stars 28 forks source link

PySpark: Method saveAsBigQueryTable does not exist #54

Closed mfoti closed 6 years ago

mfoti commented 6 years ago

Hi, I can't run the example in PySpark, because I get the error in title.

sc = SparkSession.builder \
    .appName("PySpark To BigQuery: Publication test") \
    .getOrCreate()

sqlContext = SQLContext(sc)

bigquery = sc._sc._jvm.com.samelamin.spark.bigquery

# Prepare the bigquery context
bq = bigquery.BigQuerySQLContext(sc._wrapped._jsqlContext)
# bq.setGcpJsonKeyFile(KEY_FILE)
bq.setBigQueryProjectId(BQ_PROJECT_ID)
bq.setGSProjectId(BQ_PROJECT_ID)
bq.setBigQueryGcsBucket(STAGING_BUCKET)
bq.setBigQueryDatasetLocation(DATASET_LOCATION)

df...

BigQueryDataFrame = bigquery.BigQueryDataFrame(df._jdf)
BigQueryDataFrame.saveAsBigQueryTable("{0}:{1}.{2}".format(BQ_PROJECT_ID, BQ_DATASET_ID, BQ_TABLE_NAME))

py4j.Py4JException: Method saveAsBigQueryTable([class java.lang.String]) does not exist

samelamin commented 6 years ago

This sounds like you do not have the jar in the path, just make sure it is before running it

mfoti commented 6 years ago

I can see it in logs:

18/02/09 18:01:18 WARN org.apache.spark.deploy.yarn.Client: Same path resource file:/root/.ivy2/jars/com.github.samelamin_spark-bigquery_2.11-0.2.3.jar added multiple times to distributed cache.

samelamin commented 6 years ago

Hmmm strange you can try adding it to the python path. But it's easier if you import ipdb and user the debugger to understand what's happening.

I'll create a fresh environment this weekend and see if I can replicate On Fri, 9 Feb 2018 at 18:02, Mauro Foti notifications@github.com wrote:

I can see it in logs:

18/02/09 18:01:18 WARN org.apache.spark.deploy.yarn.Client: Same path resource file:/root/.ivy2/jars/com.github.samelamin_spark-bigquery_2.11-0.2.3.jar added multiple times to distributed cache.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/54#issuecomment-364510733, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLm_7NwF8gMKJynCuTiI3InxYQgyuxks5tTIhIgaJpZM4SACuz .

samelamin commented 6 years ago

I couldnt replicate this issue