samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

Struck with error py4j.protocol.Py4JJavaError: An error occurred while calling o39.saveAsBigQueryTable. : java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;) #75

Closed kishore8714 closed 5 years ago

kishore8714 commented 5 years ago

Using Amazon EMR with Hadoop2, Java 1.8. i would like stream data from Amazon Emr to Bigquery Struck with getting error File "/home/hadoop/pyjobs/py_script/s3_bigquery_0_1.py", line 58, in bqDF.saveAsBigQueryTable("{0}:{1}.{2}".format(BQ_PROJECT_ID, BQ_DATA_SET, TABLE_NAME),False,0,bigquery.getattr("package$WriteDisposition$").getattr("MODULE$").WRITE_EMPTY(),bigquery.getattr("package$CreateDisposition$").getattr("MODULE$").CREATE_IF_NEEDED()) File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o39.saveAsBigQueryTable. : java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V at com.google.cloud.hadoop.io.bigquery.BigQueryStrings.parseTableReference(BigQueryStrings.java:68) at com.samelamin.spark.bigquery.BigQueryDataFrame.saveAsBigQueryTable(BigQueryDataFrame.scala:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)

Command Line spark-submit --packages com.github.samelamin:spark-bigquery_2.11:0.2.6,org.apache.hadoop:hadoop-aws:2.7.3,com.databricks:spark-csv_2.11:1.3.0 --jars /home/hadoop/pyjobs/jars/minimal-json-0.9.4.jar,/home/hadoop/pyjobs/jars/spark-bigquery-0.2.5.jar,/home/hadoop/pyjobs/jars/spark-bigquery-0.1.0-s_2.11.jar,/home/hadoop/pyjobs/jars/gcs-connector-hadoop2-latest.jar,/home/hadoop/pyjobs/jars/google-api-client-1.4.1-beta.jar,/home/hadoop/pyjobs/jars/guava-21.0.jar,,/home/hadoop/pyjobs/jars/google-api-services-bigquery-v2-rev92-1.14.2-beta.jar /home/hadoop/pyjobs/py_script/s3_bigquery_0_1.py

kishore8714 commented 5 years ago

Unable to import Module.. import com.samelamin.spark.bigquery._ Import ERROR in python files..

samelamin commented 5 years ago

This sounds like a bug that was fixed with the latest release, can you confirm what version you are using?

sunny1978 commented 5 years ago

I picked 2.6.0 and built an uber jar. Still doesn't work. Read and Write, both throwing same error My Jar size: 15710607 Apr 29 15:49 sparkbigquery-0.0.1-SNAPSHOT.jar (this is a wrapper jar. I built it so that I can have a big uber jar & incl all)

Read: scala> val df = spark.sqlContext.read.format("com.samelamin.spark.bigquery").option("tableReferenceSource","projectid:schema.table").load() java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V at com.google.cloud.hadoop.io.bigquery.BigQueryStrings.parseTableReference(BigQueryStrings.java:68) at com.samelamin.spark.bigquery.BigQueryRelation.getConvertedSchema(BigQueryRelation.scala:19) at com.samelamin.spark.bigquery.BigQueryRelation.schema(BigQueryRelation.scala:13) at org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:40) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:389) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)

Write: scala> avrodf.saveAsBigQueryTable("projectid:schema.table") java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V at com.google.cloud.hadoop.io.bigquery.BigQueryStrings.parseTableReference(BigQueryStrings.java:68) at com.samelamin.spark.bigquery.BigQueryDataFrame.saveAsBigQueryTable(BigQueryDataFrame.scala:40) ... 50 elided

guava: 26.0

jdk1.8

Please help us. This seems to be a great api. very promising to use and easiness.

sunny1978 commented 5 years ago

This sounds like a bug that was fixed with the latest release, can you confirm what version you are using?

2.6.0 also has this error. ShowStopper.

ameyamahajan commented 5 years ago

I was able to get this resolved by shading the google libraries.

 <configuration>
   <relocations>
     <relocation>
       <pattern>com.google</pattern>               
         <shadedPattern>shaded.guava</shadedPattern>
           <includes>
             <include>com.google.**</include>
           </includes>
           <excludes>
             <exclude>com.google.common.base.Optional</exclude>
             <exclude>com.google.common.base.Absent</exclude>
             <exclude>com.google.common.base.Present</exclude>
             <exclude>com.google.cloud.**</exclude>
           </excludes>
     </relocation>
   </relocations>
 </configuration>
samelamin commented 5 years ago

Cheers for adding the example @ameyamahajan I would really appreciate it if you add a ToDo section in the readme :)