numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Issue with row level deletion using write operation delete #250

Open torvalds-dev-testbot[bot] opened 3 months ago

torvalds-dev-testbot[bot] commented 3 months ago

Tips before filing an issue

Describe the problem you faced

I am currently trying to perform row level deletion using the write operation delete in one of my tables (COW) with Hudi version 0.14.0. Despite using the provided configuration for deletion, no rows are getting deleted when I execute my PySpark job. I verified the rows before deletion to ensure correct filtering.

To Reproduce

Steps to reproduce the behavior:

  1. Execute PySpark job for row level deletion.

Expected behavior

I expected the specified rows to be deleted as per the filter condition.

Environment Description

Additional context

I am not encountering any errors during the process, just the rows are not being deleted as expected.

Stacktrace

Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 3 months ago

Relevant Sources:

https://www.linkedin.com/pulse/small-talk-apache-hudi-ashok-kumar-kunkala-3ldge/

https://hudi.apache.org/docs/quick-start-guide

https://datacouch.medium.com/getting-started-with-apache-hudi-711b89c107aa

https://apache-hudi.slack.com/archives/C4D716NPQ/p1713562820086629

torvalds-dev-testbot[bot] commented 3 months ago

Hello! 👋 Please wait while I generate the follow up questions; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 3 months ago

Based on the provided documentation snippets, it seems like the row level deletion with the Hoodie write operation "delete" might not be working as expected in your scenario. Here are some insights and steps to potentially debug this issue:

If you are still facing issues with the row level deletion despite following these guidelines, further investigation into the filtering conditions and the underlying data might be necessary to resolve the problem effectively.

Sources:

torvalds-dev-testbot[bot] commented 3 months ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/250