Closed ravwojdyla closed 7 years ago
@brkyvz , could you please assist here? What is the process of building/publishing spark packages jars? Is the build process open source somewhere?
@ravwojdyla It seems to me that the 0.1.2 version of Spark BigQuery was built and published with Spark 1.6, and is not compatible with Spark 2.0, therefore either the maintainers should publish a new version of the library compiled against Spark 2.0, or you should in your project use Spark 1.6. Hence, I don't think this is a Spark Packages specific problem.
If you would like to use the library against Spark 2.0, and if the code is currently source compatible, you may build the library from source, and use that in your project.
@brkyvz :man_facepalming: you are absolutely right. thanks for pointing that out, and sorry for bothering you. Closing this.
I'm trying to use spark-bigquery as a dependency by either:
or
(Both resolved dependency just fine)
and when I try to use the classes/methods provided by spark-bigquery I get either:
which is a manifestation of the same problem.
org.apache.spark.sql.DataFrame
is not a class, but an alias:Unfortunately the jar provided by maven/spark-packages does not translate the alias, instead uses
org.apache.spark.sql.DataFrame
in the compiled code, therefor my project gets confused, and can't loadorg.apache.spark.sql.DataFrame
because it does not exist (and should not).There seems to be a problem in the way spark-packages is building/publishing jars. If I package spark-bigquery locally, and decompile it (for example
BigQuerySQLContext
), we can see that locally compiled code is in fact usingorg.apache.spark.sql.Dataset
:while jar from maven/spark-packages is still using
org.apache.spark.sql.DataFrame
in the compiled code:at this point
org.apache.spark.sql.DataFrame
is expected to be a class and classloader gets confused -> throws error.One way to solve it might be to use Dataset explicitly, but honestly this seems like something that should be solved in spark-packages.