torvalds-dev / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

Compaction/Clustering job not fail when throw HoodieException #28

Open torvalds-dev opened 9 months ago

torvalds-dev commented 9 months ago

Clustering/Compaction job throw follow exception, the final result returns -1 and the job's state is success.

ERROR UtilHelpers: Cluster failed org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 (TID 5) : org.apache.hudi.exception.HoodieException: unable to read next record from parquet file at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) Issue Links links to Web Link https://github.com/apache/hudi/pull/10050 Activity All Comments Work Log History Activity Transitions

danny0405 Danny Chen added a comment - 02/Dec/23 03:43 Fixed via master branch: https://github.com/torvalds-dev/hudi/commit/77e2f42c6e49478cadf2b6bc5c732ee8673a8d69

torvalds-dev commented 9 months ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 9 months ago

The issue you're referring to is related to Apache Hudi, a data lake framework that provides stream processing and incremental data pipeline capabilities. The specific problem is with the compaction and clustering jobs in Hudi.

From the error message, it seems that the job is failing due to an exception thrown by Hudi: HoodieException: unable to read next record from parquet file. This suggests that there's an issue with reading data from a Parquet file, which is a columnar storage file format that is optimized for use with big data processing frameworks like Apache Spark.

The code you've provided is a test case for a clustering job in Hudi. It sets up a Flink streaming job to perform clustering on a Hudi table. The job reads a clustering plan, transitions the clustering instant to 'inflight' state, sets up a data stream with a source function that emits clustering commit events, and finally executes the job.

The issue seems to be that even when the job fails due to the HoodieException, it still returns a success state (-1). This could be problematic as it might lead to data inconsistencies or loss if not handled properly.

The fix for this issue, as mentioned by Danny Chen, has been committed to the master branch of the Hudi repository. It would be helpful to look at the commit to understand what changes were made to address this issue.

To answer your question, I would need more specific details about what you want to know. Are you interested in understanding why the exception is thrown, how the fix addresses the issue, or something else? Please provide more details so I can assist you better.