Closed torvalds-dev closed 7 months ago
torvalds.dev is analyzing the ticket
The issue you're referring to is related to Apache Hudi, a data lake framework that provides stream processing and incremental data pipeline capabilities. The specific problem is with the compaction and clustering jobs in Hudi.
From the error message, it seems that the compaction/clustering job is failing due to an exception (HoodieException
) being thrown. However, despite this failure, the job's state is being reported as successful, which is incorrect. The exception is being thrown when the system is unable to read the next record from a Parquet file.
The error message indicates that the failure occurred in the ParquetReaderIterator
class, specifically in the hasNext
method. This method is used to check if there are more records to read from the Parquet file.
The issue was fixed in the master branch of the Hudi repository, as per the commit link provided. The fix likely involves correctly handling the HoodieException
and ensuring that the job's state is accurately reported when such an exception is thrown.
As for the code snippet you provided, it seems to be a test case for Hudi's real-time input format. It's setting up a Parquet table, simulating inserts, and then verifying that incremental reads do not show inserts after a certain timestamp. It also tests the conversion of Avro records to ArrayWritable objects. The createCompactionFile
method is used to create a compaction file with a given commit time.
If you have a specific question about this code or the issue, please let me know!
Clustering/Compaction job throw follow exception, the final result returns -1 and the job's state is success.
ERROR UtilHelpers: Cluster failed org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 (TID 5) : org.apache.hudi.exception.HoodieException: unable to read next record from parquet file at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) Issue Links links to Web Link https://github.com/apache/hudi/pull/10050 Activity All Comments Work Log History Activity Transitions
danny0405 Danny Chen added a comment - 02/Dec/23 03:43 Fixed via master branch: https://github.com/torvalds-dev/hudi/commit/77e2f42c6e49478cadf2b6bc5c732ee8673a8d69