This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20
stars
23
forks
source link
[BUG] NoClassDefFoundError when attempting to read from Vertica #559
Missing class def. I see vertica-spark is on spark 3.3 - possibly some deprecation there?
Steps to reproduce:
Expected behaviour:
Actual behaviour:
Error message/stack trace:
py4j.protocol.Py4JJavaError: An error occurred while calling o254.createOrReplace.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3099 in stage 0.0 failed 4 times, most recent failure: Lost task 3099.3 in stage 0.0 (TID 3281) ([2600:1f18:41ad:2102:1022:9d72:bf0:2463] executor 102): java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/SQLConf$LegacyBehaviorPolicy$
at com.vertica.spark.datasource.fs.HadoopFileStoreLayer.openReadParquetFile(FileStoreLayerInterface.scala:380)
at com.vertica.spark.datasource.core.VerticaDistributedFilesystemReadPipe.$anonfun$startPartitionRead$2(VerticaDistributedFilesystemReadPipe.scala:429)
at scala.util.Either.flatMap(Either.scala:341)
at com.vertica.spark.datasource.core.VerticaDistributedFilesystemReadPipe.startPartitionRead(VerticaDistributedFilesystemReadPipe.scala:416)
at com.vertica.spark.datasource.core.DSReader.openRead(DSReader.scala:65)
at com.vertica.spark.datasource.v2.VerticaBatchReader.<init>(VerticaDatasourceV2Read.scala:273)
at com.vertica.spark.datasource.v2.VerticaReaderFactory.createReader(VerticaDatasourceV2Read.scala:261)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.hasNext(Unknown Source)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:441)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1409)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.internal.SQLConf$LegacyBehaviorPolicy$
... 32 more
Code sample or example on how to reproduce the issue:
Environment
Problem Description
Missing class def. I see vertica-spark is on spark 3.3 - possibly some deprecation there?
Spark Connector Logs