tabular-io / iceberg-kafka-connect

Apache License 2.0
169 stars 31 forks source link

Question: One control topic per connector? #258

Closed dshma closed 1 month ago

dshma commented 1 month ago

Hey @bryanck,

Are there any recommendations, perhaps requirements, for the control topic? Particularly interested in the relationship, whether it should be one-to-one - one control topic per one source topic/connector, or it is okay to have like one-to-many. I recently came across the following article: https://docs.redpanda.com/current/deploy/deployment-option/cloud/managed-connectors/create-iceberg-sink-connector/#limitations, which explicitly states "Each Iceberg Sink connector must have its own control topic", though I've never had it like this in my mind and haven't found anything like it mentioned in the documentation here. Still, to some extent I can understand the drivers to keep it as a one-to-one, would like to hear your thoughts on this.

Thanks in advance!

bryanck commented 1 month ago

Multiple sink connectors can use the same control topic if desired. Control messages are filtered within a connector based on the Connect consumer group ID.

tabmatfournier commented 1 month ago

You do not need to have a control topic per connector. These are low traffic relative to your source topics until you are at a significant scale (many topics with many partitions per topic).

As the number of topics (and topics w/ large partitions) increases, the more traffic in these topics. When you have one control topic, ALL connectors are reading from that topic and filtering out for just the messages they need. As scale increases, the consumer may be consuming 99% junk that gets filtered out.

Cons of one control topic:

I don't think these are particularly huge cons and I would run with a single control topic. I might rethink this if I was running 500+ connectors, probably for the blast radius problem.

dshma commented 1 month ago

Thx guys, appreciate your responsiveness. Yeah, basically my understanding is the same, haven't encountered any significant problems so far using it at quite a different scale. I guess I got what I needed to double check here, hence closing the ticket.