spotify / spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames
Apache License 2.0
155 stars 52 forks source link

Json parsing failed when i was using the saveAsBigQueryTable #44

Closed Ayyappadas1 closed 6 years ago

Ayyappadas1 commented 6 years ago

I was trying to load the data to BigQuery using the below sample code val conf1 = new SparkConf().setAppName("App").setMaster("local[2]") val sc = new SparkContext(conf1) val sqlContext = new org.apache.spark.sql.SQLContext(sc) sqlContext.setGcpJsonKeyFile("gcskey.json")

// Set up BigQuery project and bucket sqlContext.setBigQueryProjectId("proj_name") sqlContext.setBigQueryGcsBucket("gcsbucket")

// Set up BigQuery dataset location, default is US sqlContext.setBigQueryDatasetLocation("US") Usage:

// Load everything from a table val table = sqlContext.bigQueryTable("bigquery-public-data:samples.shakespeare")

// Load results from a SQL query // Only legacy SQL dialect is supported for now val df = sqlContext.bigQuerySelect( "SELECT word, word_count FROM [bigquery-public-data:samples.shakespeare]")

// Save data to a table df.saveAsBigQueryTable("my-project:my_dataset.my_table")

While executing the code i got below error when it was trying to execute the last statement

165037 [main] ERROR org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation - Aborting job. java.io.IOException: Failed to parse JSON: Unexpected token; Parser terminated before end of string

Can somebody help me to resolve this issue?

darylerwin commented 6 years ago

I havent got it to work from my pc yet but it seems that there is something wrong with your creds json file - either it is incorrect or it cant be found. gcskey.json

Ayyappadas1 commented 6 years ago

Hi Darylerwin

Thank for your resonse. I have given the full path of my key file and it is writing to staging temp table named spark_bigquery_20171003061852_1265436538 also.But not writing to my final table.

Ayyappadas1 commented 6 years ago

Please find below the error message i ma getting

193901 [main] INFO com.google.cloud.hadoop.io.bigquery.BigQueryHelper - Importing into table 'pure-respect-180709:wordcount_dataset.word_cnt' from 2 paths; path[0] is 'gs://iipbucket/hadoop/tmp/spark-bigquery/spark-bigquery-1507035195204=705545879/part-r-00000-405779e9-5bff-4a89-bcaf-2563fee072be.avro'; awaitCompletion: true 193903 [main] INFO com.google.cloud.hadoop.io.bigquery.BigQueryHelper - No import schema provided, auto detecting schema. 200709 [main] ERROR org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation - Aborting job. java.io.IOException: Failed to parse JSON: Unexpected token; Parser terminated before end of string at com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion(BigQueryUtils.java:95) at com.google.cloud.hadoop.io.bigquery.BigQueryHelper.importFromGcs(BigQueryHelper.java:164) at com.google.cloud.hadoop.io.bigquery.output.IndirectBigQueryOutputCommitter.commitJob(IndirectBigQueryOutputCommitter.java:57) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFs

Ayyappadas1 commented 6 years ago

Hi

The issue is solved .This is due to the difference in version of scala in Maven dependency(intellij) version 2.11 and in the cluster (version 2.10).When i changed teh version to 2.10 in Maven dependency it worked