redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.65k stars 589 forks source link

datalake/translation: fix data partition to coordinator mapping #24038

Closed bharathv closed 1 week ago

bharathv commented 1 week ago

An inherent assumption in coordinator implementation is that all topic/table related operations are performed by the same coordinator for correctness (so we have one active writer per table active at most times). This is particularly relevant in filesystem catalog implementation as it relies on the filesystem for concurrency and multiple interleaved writers can leave it in a questionable state.

This expectation is violated in our coordinator hash function that factors in topic and partition as input to the hash function resulting in ntps from same topic hashing to different coordinator partitions.

Backports Required

Release Notes

vbotbuildovich commented 1 week ago

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/57716#019303e9-971d-4e3c-89c2-277b66d76b41 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/57716#019303e9-9720-43a7-ae3a-52af32bd2b13