tabular-io / iceberg-kafka-connect

Apache License 2.0
171 stars 31 forks source link

Question regarding lack of zombie fencing #200

Open igorcalabria opened 4 months ago

igorcalabria commented 4 months ago

Hi, in the docs (https://github.com/tabular-io/iceberg-kafka-connect/blob/main/docs/design.md#zombie-fencing) it's mentioned that if a tasks is reassigned in the middle of a transaction, it's possible that the new task may process duplicate data.

This confused me a bit because, from limited understanding, https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics should prevent this. If there's a pending transaction touching the committed offsets for that group, shouldn't the new task block until the old transaction is finished?

Is this because the workers actually get their offsets using kafkaAdminAPI here? https://github.com/tabular-io/iceberg-kafka-connect/blob/main/kafka-connect/src/main/java/io/tabular/iceberg/connect/channel/Worker.java#L91 If that's the case, maybe using requireStable=true on ListConsumerGroupOffsetOptions may solve this. I'm pretty sure that I've missed something, so I apologize if this is a stupid question.