tabular-io / iceberg-kafka-connect

Apache License 2.0
203 stars 46 forks source link

Handle partition spec evolutions gracefully #202

Closed fqtab closed 6 months ago

fqtab commented 7 months ago

Currently the connector will get stuck if it ever encounters a batch of commit responses containing files with different partition specs. This is because the append API currently does not support appending files with different partition specs. There is an open PR to potentially address this limitation. In the meantime, we can "hack" around this limitation by appending files with different specs as separate append operations in a single transaction. It's not ideal but on balance, the trade-off should be worth it; partition spec evolution should not be a frequent event for most use cases.

bryanck commented 6 months ago

Can you just include a patched version of Iceberg that includes your upstream fix, rather than working around the issue here?

fqtab commented 6 months ago

Can you just include a patched version of Iceberg that includes your upstream fix, rather than working around the issue here?

That feels more hacky to me, I'd rather not resort to that just yet. The upstream fix is a relatively fundamental change in iceberg-core, I'm still waiting for more experienced iceberg contributors to confirm it's safe, and it's not clear to me if the community will accept it. I'd rather not end up in a situation where I have to maintain patches long-term.

Is you have a particular concern with this work around, please let me know! 😃 Some of our users have reported running into this issue recently and I consider this work-around to be an effective short-term solution whilst we hopefully towards a long-term solution in iceberg-core.