Open leoeareis opened 6 months ago
Let's try to isolate the problem (is it related to Databricks environment?) and also make it reproducible so I can confirm a certain solution works. Can you try to reproduce the problem outside of Databricks environment?
It would also be really helpful if you can prepare and share a minimal project (can be based on https://github.com/thesamet/sparksql-scalapb-test) and try to reproduce it both in and outside databricks. Since it will also include some specific protos that causes failure maybe that would provide another direction
Hello @thesamet! I tried to update a service to Databricks 14.2 and above that uses the sparksql35-scalapb0_11_2.12
dependency and I got the following error:
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke.<init>(Ljava/lang/Class;Lorg/apache/spark/sql/types/DataType;Ljava/lang/String;Lscala/collection/Seq;Lscala/collection/Seq;ZZZ)V
at frameless.TypedEncoder$$anon$1.toCatalyst(TypedEncoder.scala:69)
at frameless.RecordEncoder.$anonfun$toCatalyst$2(RecordEncoder.scala:155)
at scala.collection.immutable.List.map(List.scala:293)
at frameless.RecordEncoder.toCatalyst(RecordEncoder.scala:153)
at frameless.TypedExpressionEncoder$.apply(TypedExpressionEncoder.scala:28)
at scalapb.spark.Implicits.typedEncoderToEncoder(TypedEncoders.scala:119)
at scalapb.spark.Implicits.typedEncoderToEncoder$(TypedEncoders.scala:116)
at scalapb.spark.Implicits$.typedEncoderToEncoder(TypedEncoders.scala:122)
This doesn't happen locally. To your suggestion, I forked this repo https://github.com/thesamet/sparksql-scalapb-test/tree/master to see if the problem is related to the Databricks environment. The code can be found here: https://github.com/anamariavisan/sparksql-scalapb-test. To build the app I ran these commands:
curl -s "https://get.sdkman.io" | bash
sdk install java 11.0.24-zulu
sdk install sbt 1.6.2
sbt assembly
And to test it locally:
sdk install spark 3.5.0
spark-submit \
--jars . \
--class myexample.RunDemo \
target/scala-2.12/sparksql-scalapb-test-assembly-1.0.0.jar
To test it in Databricks, I created a job and I uploaded the library target/scala-2.12/sparksql-scalapb-test-assembly-1.0.0.jar
with the main class being myexample.RunDemo
. I submitted the job locally and it worked, but in Databricks 14.2 and above, it failed with:
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke.<init>(Ljava/lang/Class;Lorg/apache/spark/sql/types/DataType;Ljava/lang/String;Lscala/collection/Seq;Lscala/collection/Seq;ZZZ)V
at scalapb.spark.ToCatalystHelpers.fieldToCatalyst(ToCatalystHelpers.scala:165)
at scalapb.spark.ToCatalystHelpers.fieldToCatalyst$(ToCatalystHelpers.scala:107)
at scalapb.spark.ProtoSQL$$anon$1$$anon$2.fieldToCatalyst(ProtoSQL.scala:84)
at scalapb.spark.ToCatalystHelpers.$anonfun$messageToCatalyst$2(ToCatalystHelpers.scala:39)
I searched how to fix it and I found these issues that describe the same problem:
I also left a comment on this issue on the frameless repo https://github.com/typelevel/frameless/issues/787.
What is your action course on this matter for scalapb-sparksql?
This is not actionable by sparksql-scalapb until there's a fix for frameless on Spark 3.5 and DBR 14.2.
This is not actionable by sparksql-scalapb until there's a fix for frameless on Spark 3.5 and DBR 14.2.
fyi - the second stack is sparksql-scalapb internal and due to spark internal api usage rather than frameless itself. The proposed solution for frameless (#787) via shim could also be leveraged for the sparksql-scalapb api usage (tested across all supported DBRs for frameless usage at least).
Hi! I made some updates in my project from Spark 3.4.1 to Spark 3.5.0 and updated the scalapb dependency from sparksql34-scalapb0_11 to sparksql35-scalapb0_11. After this upgrade, I faced this error:
I run my jobs in a Databricks environment using Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12) and my udf that performs the protobuf decoder is defined as
Could you help me how to figure out this error?