Open torvalds-dev-testbot[bot] opened 3 months ago
Not helpful resources
Hello! 👋 Please wait while I generate the follow up questions; this usually takes about 30 seconds
Running two DeltaStreamers writing to the same target table in Apache Hudi is handled through multi-writer support. If two writers are writing to different sets of data within the same table, both can concurrently write and successfully commit. However, if writers are updating the same set of data, conflict resolution occurs through Optimistic Concurrency Control (OCC). With OCC, one of the writers will succeed while the other will abort if there are conflicting writes to the same underlying file groups. Multi-writer support in Hudi involves Conflict Resolution component that deduces conflicting operations and employs a conflict resolution strategy to handle conflicts. By default, SimpleConcurrentFileWritesConflictResolutionStrategy operates at the File Group level, allowing multiple writers to succeed when writing to different file groups. If two writers write to overlapping file groups, conflict resolution decides which write succeeds. This system operates at the file group level, not record level, to optimize performance.
Source: https://medium.com/@simpsons/multi-writer-support-with-apache-hudi-e1b75dca29e6 Source: https://hudi.apache.org/docs/concurrency_control
Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/236
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
I would like to understand how Hudi deals with multiple DeltaStreamers running for the same table. Does it put one in a wait state until the other finishes, or does this lead to table corruption?
To Reproduce
Steps to reproduce the behavior:
1. 2. 3. 4.
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.