Closed RodrigoBorges93 closed 1 month ago
hi @RodrigoBorges93 Try adding the following jars together to see if they work
Hi @allenhaozi Thanks for your help. I have added the jars you mentioned and now we have this error:
py4j.protocol.Py4JJavaError: An error occurred while calling o70.sessionState. : java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.TimeTravel can not implement org.apache.spark.sql.catalyst.plans.logical.LeafNode, because it is not an interface (org.apache.spark.sql.catalyst.plans.logical.LeafNode is in unnamed module of loader 'app') at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(Unknown Source) at java.base/java.security.SecureClassLoader.defineClass(Unknown Source) at java.base/java.net.URLClassLoader.defineClass(Unknown Source) at java.base/java.net.URLClassLoader$1.run(Unknown Source) at java.base/java.net.URLClassLoader$1.run(Unknown Source) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/java.net.URLClassLoader.findClass(Unknown Source) at java.base/java.lang.ClassLoader.loadClass(Unknown Source) at java.base/java.lang.ClassLoader.loadClass(Unknown Source) at io.delta.sql.parser.DeltaSqlParser.<init>(DeltaSqlParser.scala:71) at io.delta.sql.DeltaSparkSessionExtension.$anonfun$apply$1(DeltaSparkSessionExtension.scala:78) at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildParser$1(SparkSessionExtensions.scala:239) at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68) at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:49) at org.apache.spark.sql.SparkSessionExtensions.buildParser(SparkSessionExtensions.scala:238) at org.apache.spark.sql.internal.BaseSessionStateBuilder.sqlParser$lzycompute(BaseSessionStateBuilder.scala:124) at org.apache.spark.sql.internal.BaseSessionStateBuilder.sqlParser(BaseSessionStateBuilder.scala:123) at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:341) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1142) at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:156) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:152) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:149) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Unknown Source)
Do you know if we have to do something else?
We are also having bintray server errors:
`:: problems summary :: :::: ERRORS SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-main/3.0.0/hadoop-main-3.0.0.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-project/3.0.0/hadoop-project-3.0.0.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-azure/3.0.0/hadoop-azure-3.0.0-sources.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-azure/3.0.0/hadoop-azure-3.0.0-src.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-azure/3.0.0/hadoop-azure-3.0.0-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-project-dist/3.0.0/hadoop-project-dist-3.0.0.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-common/3.0.0/hadoop-common-3.0.0-sources.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-common/3.0.0/hadoop-common-3.0.0-src.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-common/3.0.0/hadoop-common-3.0.0-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-annotations/3.0.0/hadoop-annotations-3.0.0-sources.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-annotations/3.0.0/hadoop-annotations-3.0.0-src.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-annotations/3.0.0/hadoop-annotations-3.0.0-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/sonatype/oss/oss-parent/7/oss-parent-7.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/google/guava/guava-parent/11.0.2/guava-parent-11.0.2.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/apache/4/apache-4.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/11/commons-parent-11.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/apache/9/apache-9.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/24/commons-parent-24.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/apache/13/apache-13.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/httpcomponents/project/7/project-7.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/httpcomponents/httpcomponents-client/4.5.2/httpcomponents-client-4.5.2.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/httpcomponents/httpcomponents-core/4.4.4/httpcomponents-core-4.4.4.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/28/commons-parent-28.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/25/commons-parent-25.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/23/commons-parent-23.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/apache/16/apache-16.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/39/commons-parent-39.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/net/java/jvnet-parent/3/jvnet-parent-3.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/eclipse/jetty/jetty-parent/25/jetty-parent-25.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/eclipse/jetty/jetty-project/9.3.19.v20170502/jetty-project-9.3.19.v20170502.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/net/java/jvnet-parent/4/jvnet-parent-4.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/sun/jersey/jersey-project/1.19/jersey-project-1.19.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/apache/7/apache-7.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/17/commons-parent-17.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/apache/18/apache-18.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/41/commons-parent-41.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/37/commons-parent-37.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/slf4j/slf4j-parent/1.7.25/slf4j-parent-1.7.25.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/apache/10/apache-10.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/avro/avro-toplevel/1.7.7/avro-toplevel-1.7.7.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/avro/avro-parent/1.7.7/avro-parent-1.7.7.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/codehaus/codehaus-parent/1/codehaus-parent-1.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/thoughtworks/paranamer/paranamer-parent/2.3/paranamer-parent-2.3.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/google/google/1/google-1.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-auth/3.0.0/hadoop-auth-3.0.0-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/net/minidev/minidev-parent/2.3/minidev-parent-2.3.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/ow2/ow2/1.3/ow2-1.3.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/ow2/asm/asm-parent/5.0.4/asm-parent-5.0.4.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/jline/jline/0.9.94/jline-0.9.94-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/sonatype/oss/oss-parent/9/oss-parent-9.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/curator/apache-curator/2.12.0/apache-curator-2.12.0.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/curator/curator-framework/2.12.0/curator-framework-2.12.0-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/curator/curator-client/2.12.0/curator-client-2.12.0-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/kerby/kerby-all/1.0.1/kerby-all-1.0.1.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/kerby/kerby-kerb/1.0.1/kerby-kerb-1.0.1.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/kerby/kerby-common/1.0.1/kerby-common-1.0.1.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/kerby/kerby-provider/1.0.1/kerby-provider-1.0.1.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/sonatype/oss/oss-parent/6/oss-parent-6.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/curator/curator-recipes/2.12.0/curator-recipes-2.12.0-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/apache/17/apache-17.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/apache/htrace/htrace/4.1.0-incubating/htrace-4.1.0-incubating.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/fasterxml/oss-parent/25/oss-parent-25.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/fasterxml/jackson/jackson-parent/2.7/jackson-parent-2.7.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/fasterxml/oss-parent/24/oss-parent-24.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1-javadoc.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/microsoft/azure/azure-bom/0.8.0/azure-bom-0.8.0.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/com/microsoft/azure/azure/0.8.0/azure-0.8.0.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/antlr/antlr4-master/4.8/antlr4-master-4.8.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/antlr/antlr-master/3.5.2/antlr-master-3.5.2.jar
SERVER ERROR: Bad Gateway url=https://dl.bintray.com/spark-packages/maven/org/glassfish/json/1.0.4/json-1.0.4.jar
`
It looks like a version compatibility issue
I build our own Spark image base on spark-3.2.1-bin-hadoop3.2.tgz
and the following jars:
I'll try to build this image as well.
Thanks!
@RodrigoBorges93 -did it work for you? I am also seeing the same issue.
I have Spark version: 3.1.3 Scala versions: 2.12.10
I'll put one into docker Hub, you can try it if u want allenhaozi/deltalake-1.2.1-py-3.8:v0.1.0
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Hi there!
I'm trying to save a delta file from a csv in pyspark. I have added the following packages:
Spark operator Image: gcr.io/spark-operator/spark-py:v3.1.1-hadoop3
I'm able to save the file as parquet, but when I try to save as Delta, the following error happens:
WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 7) (10.244.3.46 executor 1): java.lang.ClassNotFoundException: org.apache.spark.sql.delta.files.DelayedCommitProtocol at java.base/java.net.URLClassLoader.findClass(Unknown Source) at java.base/java.lang.ClassLoader.loadClass(Unknown Source) at java.base/java.lang.ClassLoader.loadClass(Unknown Source) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Unknown Source) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68) at java.base/java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) at java.base/java.io.ObjectInputStream.readClassDesc(Unknown Source) at java.base/java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.base/java.io.ObjectInputStream.readObject0(Unknown Source) at java.base/java.io.ObjectInputStream.readArray(Unknown Source) at java.base/java.io.ObjectInputStream.readObject0(Unknown Source) at java.base/java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.base/java.io.ObjectInputStream.readSerialData(Unknown Source) at java.base/java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.base/java.io.ObjectInputStream.readObject0(Unknown Source) at java.base/java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.base/java.io.ObjectInputStream.readSerialData(Unknown Source) at java.base/java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.base/java.io.ObjectInputStream.readObject0(Unknown Source) at java.base/java.io.ObjectInputStream.readObject(Unknown Source) at java.base/java.io.ObjectInputStream.readObject(Unknown Source) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)
Does someone know how to fix this? Is this a package version problem?