vertica / spark-connector

This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20 stars 23 forks source link

[BUG] NoClassDefFoundError when attempting to read from Vertica #559

Open padraic-mcatee opened 4 months ago

padraic-mcatee commented 4 months ago

Environment


Problem Description

Missing class def. I see vertica-spark is on spark 3.3 - possibly some deprecation there?

  1. Steps to reproduce:
  2. Expected behaviour:
  3. Actual behaviour:
  4. Error message/stack trace:
    py4j.protocol.Py4JJavaError: An error occurred while calling o254.createOrReplace.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3099 in stage 0.0 failed 4 times, most recent failure: Lost task 3099.3 in stage 0.0 (TID 3281) ([2600:1f18:41ad:2102:1022:9d72:bf0:2463] executor 102): java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/SQLConf$LegacyBehaviorPolicy$
    at com.vertica.spark.datasource.fs.HadoopFileStoreLayer.openReadParquetFile(FileStoreLayerInterface.scala:380)
    at com.vertica.spark.datasource.core.VerticaDistributedFilesystemReadPipe.$anonfun$startPartitionRead$2(VerticaDistributedFilesystemReadPipe.scala:429)
    at scala.util.Either.flatMap(Either.scala:341)
    at com.vertica.spark.datasource.core.VerticaDistributedFilesystemReadPipe.startPartitionRead(VerticaDistributedFilesystemReadPipe.scala:416)
    at com.vertica.spark.datasource.core.DSReader.openRead(DSReader.scala:65)
    at com.vertica.spark.datasource.v2.VerticaBatchReader.<init>(VerticaDatasourceV2Read.scala:273)
    at com.vertica.spark.datasource.v2.VerticaReaderFactory.createReader(VerticaDatasourceV2Read.scala:261)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.hasNext(Unknown Source)
    at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
    at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:441)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1409)
    at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486)
    at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
    at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
    at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
    at org.apache.spark.scheduler.Task.run(Task.scala:143)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.internal.SQLConf$LegacyBehaviorPolicy$
    ... 32 more
  5. Code sample or example on how to reproduce the issue:

Spark Connector Logs