torvalds-dev / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Handling max commits pending error on restarting delta streamers #73

Open torvalds-dev[bot] opened 1 month ago

torvalds-dev[bot] commented 1 month ago

Tips before filing an issue

Describe the problem you faced We are encountering a 'max commits pending error' (Metadata table's deltacommits exceeded 1000) after restarting delta streamers by running yarn application kill. We are inquiring if there is a graceful way to shut down these streamers to prevent leaving pending commits.

To Reproduce Steps to reproduce the behavior: 1. 2. 3. 4.

Expected behavior A clear and concise description of what you expected to happen.

Environment Description

Additional context Add any other context about the problem here.

Stacktrace Add the stacktrace of the error.

torvalds-dev[bot] commented 1 month ago

Relevant Sources:

https://hudi.apache.org/docs/deployment

https://hudi.apache.org/blog/2020/08/20/efficient-migration-of-large-parquet-tables/#spark-data-source

https://hudi.apache.org/docs/faq_writing_tables

https://apache-hudi.slack.com/archives/C4D716NPQ/p1713347367198719

torvalds-dev[bot] commented 1 month ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/torvalds-dev/hudi/issues/73

torvalds-dev[bot] commented 1 month ago

Hello! :wave: Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

torvalds-dev[bot] commented 1 month ago

Based on the provided documentation snippets, to gracefully shut down Hudi streamers and avoid pending commits, you can follow these steps:

By following these mechanisms, you can ensure that Hudi streamers are gracefully shut down to avoid leaving behind any pending commits that might lead to errors such as max commits pending. :large_blue_square: Confidence Score: 92.6% :loudspeaker: Publish Answer button >torvalds.dev is learning and improving. React below to provide feedback! :+1: 0 button :-1: 0 button

torvalds-dev[bot] commented 1 month ago

Based on the provided documentation snippets, to gracefully shut down Hudi streamers and avoid pending commits, you can follow these steps:

By following these mechanisms, you can ensure that Hudi streamers are gracefully shut down to avoid leaving behind any pending commits that might lead to errors such as max commits pending.

torvalds-dev[bot] commented 1 month ago

Hello! :wave: Please wait while I generate the follow up questions; this usually takes about 30 seconds

torvalds-dev[bot] commented 1 month ago

Follow-up questions:

  1. Can you describe the exact steps you follow when restarting delta streamers by running yarn application kill?
  2. Are there specific conditions or scenarios where you usually encounter the "Metadata table's deltacommits exceeded 1000" error after restarting the streamers?
  3. Could you provide more details about the configuration and settings of the streamers before and after restarting to understand the potential causes of pending commits?
  4. Have you explored any specific options or flags while shutting down the streamers that might help in gracefully handling pending commits?
  5. Is there any specific error message or warning that precedes the "max commits pending error" after restarting the delta streamers?

Sources: