numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Using spark structured streaming, reading from kafka and writing to a MoR hudi table, I can't get async clustering to work #177

Open torvalds-dev-testbot[bot] opened 10 months ago

torvalds-dev-testbot[bot] commented 10 months ago

Tips before filing an issue

Describe the problem you faced Using spark structured streaming, reading from kafka and writing to a MoR hudi table, I can't get async clustering to work. When the clustering job runs after the nth commit, I get an error java.lang.IllegalArgumentException: For input string: "null" at scala.collection.immutable.StringLike.parseBoolean(StringLike.scala:336) followed by: java.util.concurrent.CompletionException: org.apache.spark.SparkException: Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.spark.sql.hudi.catalog.HoodieCatalog (complete stack attached). The clustering jobs are failed, the batch it's on finishes but then does not continue to the next batch. Possibly related, I also see Cannot find catalog plugin class for catalog 'spark_catalog': errors when clustering inline but it retries the microbatch, apparently succeeds (no failed jobs appear in spark UI) and continues processing the next batch Any ideas what could be wrong?

I was using 0.13.1 and had those settings configured as specified above. One thing I found is that we were using the hudi-spark3 bundle with dataproc, which runs spark 3.3 not 3.4, so we're trying with the spark3.3 bundle instead. Also going to try experimenting with 0.14 to see how it looks. I'm guessing our problem is a spark cluster/configuration issue as this is obviously working for most folks. Greatly appreciate any other suggestions as to what might be causing this A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.