Open torvalds-dev[bot] opened 3 months ago
Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds
In the case where a Spark reader with write configurations is only reading a locked table, it may need to access DynamoDB to check the lock table for any potential conflicts. This is essential for conflict resolution purposes in scenarios where two writers are concurrently updating the same set of data in the Hudi table (source: http://medium.com/@simpsons/multi-writer-support-with-apache-hudi-e1b75dca29e6).
hoodie.write.lock.dynamodb.partition_key
is a configuration parameter used within the lock table in DynamoDB. It acts as a way to differentiate locks for different writers and is specific to the lock table, unrelated to the associated table itself. If different tables use the same partition key for locks, conflicts may arise (source: http://medium.com/@simpsons/multi-writer-support-with-apache-hudi-e1b75dca29e6).
Hudi supports tables partitioned on multiple columns, and in the scenario described where writer W1 writes to partition_col_A:data1 and partition_col_B:data1 while writer W2 writes to partition_col_A:data1 and partition_col_B:data2, writer 2 would succeed in writing its data. This is because they are writing to different values on partition_col_B despite sharing the same value on partition_col_A (source: http://medium.com/@simpsons/multi-writer-support-with-apache-hudi-e1b75dca29e6).
Ticket created successfully. Here is the link to the GitHub issue: https://github.com/torvalds-dev/hudi/issues/70
Describe the problem you faced
Three questions regarding Dynamo Multi-Writers:
hoodie.write.lock.dynamodb.partition_key
? Is it just an arbitrary name to use within the lock table which is unrelated to the associated table itself? Is it a column name? Does it have any application outside of the lock table (if I have table A and table B with lock_A_use1 partition_key=abc and lock_B_use1 partition_key=abc would there be any conflict?)To Reproduce
Expected behavior
Environment Description
Additional context
Stacktrace