numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Issue with processing CDC data using Debezium and Hudi tables in AWS Glue #229

Open torvalds-dev-testbot[bot] opened 3 months ago

torvalds-dev-testbot[bot] commented 3 months ago

Tips before filing an issue

Describe the problem you faced

I recently cutover my postgres databases to send CDC data via Debezium and am processing the data into my data lake’s Hudi tables via AWS Glue. Periodically, I encounter errors like:

When I wait and rerun the jobs, they eventually complete successfully, but this issue is becoming frequent. :wallbang:

To Reproduce

Steps to reproduce the behavior: 1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Please provide the stack trace of the error.

torvalds-dev-testbot[bot] commented 3 months ago

Relevant Sources:

https://aws.amazon.com/blogs/big-data/simplify-operational-data-processing-in-data-lakes-using-aws-glue-and-apache-hudi/

https://www.onehouse.ai/blog/how-apache-hudi-simplifies-mpp-data-warehouse-migrations

https://aws.amazon.com/blogs/big-data/automate-schema-evolution-at-scale-with-apache-hudi-in-aws-glue/

https://medium.com/@devjain1299/apache-hudi-on-aws-glue-a-step-by-step-guide-503c34a9aa95

https://github.com/apache/hudi/issues/9213

torvalds-dev-testbot[bot] commented 3 months ago

Hello! 👋 Please wait while I generate the follow up questions; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 3 months ago

Based on the documentation snippets provided, the errors An error occurred while calling o214.save. Failed to update metadata and An error occurred while calling o214.save. Error waiting for async clean service to finish are likely occurring during the processing of CDC files for operational tables using AWS Glue and Hudi. The errors may be related to issues with metadata updates or async clean service delays during the processing. To troubleshoot and mitigate these errors, consider the following steps:

These steps can help identify and address the root cause of the errors occurring during the processing of CDC data in Hudi tables using AWS Glue.

Sources:

torvalds-dev-testbot[bot] commented 3 months ago

Abhishek posted:

https://devs-numberlabs.slack.com/archives/C06PDCQPA3W/p1722327375090719?thread_ts=1722327602836159&cid=C06PDCQPA3W

torvalds-dev-testbot[bot] commented 3 months ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/229