samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

Json parsing failed when i was using the saveAsBigQueryTable #45

Closed Ayyappadas1 closed 6 years ago

Ayyappadas1 commented 6 years ago

I was trying to load the data to BigQuery using the below sample code val conf1 = new SparkConf().setAppName("App").setMaster("local[2]") val sc = new SparkContext(conf1) val sqlContext = new org.apache.spark.sql.SQLContext(sc) sqlContext.setGcpJsonKeyFile("gcskey.json")

// Set up BigQuery project and bucket sqlContext.setBigQueryProjectId("proj_name") sqlContext.setBigQueryGcsBucket("gcsbucket")

// Set up BigQuery dataset location, default is US sqlContext.setBigQueryDatasetLocation("US") Usage:

// Load everything from a table val table = sqlContext.bigQueryTable("bigquery-public-data:samples.shakespeare")

// Load results from a SQL query // Only legacy SQL dialect is supported for now val df = sqlContext.bigQuerySelect( "SELECT word, word_count FROM [bigquery-public-data:samples.shakespeare]")

// Save data to a table df.saveAsBigQueryTable("my-project:my_dataset.my_table")

While executing the code i got below error when it was trying to execute the last statement

165037 [main] ERROR org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation - Aborting job. java.io.IOException: Failed to parse JSON: Unexpected token; Parser terminated before end of string

Can somebody help me to resolve this issue?

darylerwin commented 6 years ago

Appears the code saves the data as json then writes that into bigquery. Are you able to review the json file for any issues?

samelamin commented 6 years ago

not sure where the problem is here, can you post the entire stacktrace?

id seperate the reading vs writing to identify where the issue is

Ayyappadas1 commented 6 years ago

Hi Sam

Please find below the full stack

Warning: Ignoring non-spark config property: SPARK_SQL_AUTH_ADMIN=admin 0 [main] INFO org.apache.spark.SparkContext - Running Spark version 1.6.2 242 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 404 [main] INFO org.apache.spark.SecurityManager - Changing view acls to: adminmana 404 [main] INFO org.apache.spark.SecurityManager - Changing modify acls to: adminmana 405 [main] INFO org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(adminmana); users with modify permissions: Set(adminmana) 776 [main] INFO org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 33851. 1073 [sparkDriverActorSystem-akka.actor.default-dispatcher-5] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 1104 [sparkDriverActorSystem-akka.actor.default-dispatcher-5] INFO Remoting - Starting remoting 1224 [sparkDriverActorSystem-akka.actor.default-dispatcher-5] INFO Remoting - Remoting started; listening on addresses :[akka.tcp:// sparkDriverActorSystem@10.177.116.69:33512] 1228 [main] INFO org.apache.spark.util.Utils - Successfully started service 'sparkDriverActorSystem' on port 33512. 1237 [main] INFO org.apache.spark.SparkEnv - Registering MapOutputTracker 1250 [main] INFO org.apache.spark.SparkEnv - Registering BlockManagerMaster 1260 [main] INFO org.apache.spark.storage.DiskBlockManager - Created local directory at /mydata/Mana_2.3/tmp/blockmgr-f8b8270b-ca86-4119-be30-0a9cd3fbc456 1273 [main] INFO org.apache.spark.storage.MemoryStore - MemoryStore started with capacity 4.1 GB 1315 [main] INFO org.apache.spark.SparkEnv - Registering OutputCommitCoordinator 1422 [main] INFO org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT 1459 [main] INFO org.spark-project.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8085 1462 [main] INFO org.apache.spark.util.Utils - Successfully started service 'SparkUI' on port 8085. 1464 [main] INFO org.apache.spark.ui.SparkUI - Started SparkUI at http://10.177.116.69:8085 1482 [main] INFO org.apache.spark.HttpFileServer - HTTP File server directory is /mydata/Mana_2.3/tmp/spark-7bb14363-a18e-453c-a00c-3518c171a6ee/httpd-a83d59a5-55c0-4004-94e0-5f7409d187c4 1484 [main] INFO org.apache.spark.HttpServer - Starting HTTP Server 1490 [main] INFO org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT 1492 [main] INFO org.spark-project.jetty.server.AbstractConnector - Started SocketConnector@0.0.0.0:34847 1493 [main] INFO org.apache.spark.util.Utils - Successfully started service 'HTTP file server' on port 34847. 2166 [main] INFO org.apache.spark.SparkContext - Added JAR file:/home/adminmana/bigquery-1.0-SNAPSHOT-jar-with-dependencies.jar at http://10.177.116.69:34847/jars/bigquery-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1507090421535 2213 [main] INFO org.apache.spark.executor.Executor - Starting executor ID driver on host localhost 2227 [main] INFO org.apache.spark.util.Utils - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34778. 2227 [main] INFO org.apache.spark.network.netty.NettyBlockTransferService

On Wed, Oct 4, 2017 at 12:05 AM, Sam Elamin notifications@github.com wrote:

not sure where the problem is here, can you post the entire stacktrace?

id seperate the reading vs writing to identify where the issue is

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/45#issuecomment-333937981, or mute the thread https://github.com/notifications/unsubscribe-auth/ALmGApE4dpOCLnYQAp_h6iDUm0YiU4g4ks5son6KgaJpZM4PsDns .

Ayyappadas1 commented 6 years ago

Hi Daryl,

Are you talking about the auth key file? That key file is of json format and is generated directly from bigquery webui

On Tue, Oct 3, 2017 at 11:25 PM, Daryl Erwin notifications@github.com wrote:

Appears the code saves the data as json then writes that into bigquery. Are you able to review the json file for any issues?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/45#issuecomment-333926244, or mute the thread https://github.com/notifications/unsubscribe-auth/ALmGAhvPtbVbPaSEUD43x59Bwszjjbwiks5sonUPgaJpZM4PsDns .

samelamin commented 6 years ago

We only need the stack trace not the entire log of your application

But according to that stack trace it seems you might be using the spotify version of the connector. Pr atleast you have it somewhere on the path

Can you make sure you have no reference to the spotify connector

On Wed, 4 Oct 2017 at 05:23, Ayyappadas1 notifications@github.com wrote:

Hi Daryl,

Are you talking about the auth key file? That key file is of json format and is generated directly from bigquery webui

On Tue, Oct 3, 2017 at 11:25 PM, Daryl Erwin notifications@github.com wrote:

Appears the code saves the data as json then writes that into bigquery. Are you able to review the json file for any issues?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/samelamin/spark-bigquery/issues/45#issuecomment-333926244 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ALmGAhvPtbVbPaSEUD43x59Bwszjjbwiks5sonUPgaJpZM4PsDns

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/45#issuecomment-334046317, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLm0FCCevQYB0QGXT1OYysRYApJU1Sks5sowgzgaJpZM4PsDns .