tabular-io / iceberg-kafka-connect

Apache License 2.0
203 stars 46 forks source link

Is it okay to use multiple iceberg connectors simultaneously to same target table? #151

Closed okayhooni closed 11 months ago

okayhooni commented 11 months ago

If the different converter is needed to ingest same target table, it cannot be handled multiple topics parameter on single connector. so, multiple connectors has to be needed.

Is it safe to deploy multiple iceberg sink connectors simultaneously to same target table, with respect to optimistic concurrency on iceberg table?

(Thank you always for your kind reply)

bryanck commented 11 months ago

Full disclosure, I haven't tested this, but it should work if the consumer group ID is unique for each sink. You can either give each connector a different name, or set iceberg.control.group-id to ensure this. You could also use a separate control topic for each. One issue I can see is that the kafka.connect.vtts snapshot property value will not be guaranteed to always increase.

bryanck commented 11 months ago

I didn't quite understand the rationale for doing this however, when you say different converter do you mean SMT?

okayhooni commented 11 months ago

I didn't quite understand the rationale for doing this however, when you say different converter do you mean SMT?

no SMT, I mean different converters like those below.

bryanck commented 11 months ago

I see, thanks. If you try it let us know how it goes.

okayhooni commented 11 months ago

Sure! thanks for quick answer!

By the way.. Could you give some advice to run rewrite_data_files() more FAST on Spark to compact small files in Iceberg table by streaming?

That procedure was too slow, and I found that it processed each partition path sequentially on only one spark task, NOT parallel..! (So, on the Spark plan, it always use only just 1 Spark task!)

I searched related issues on iceberg GtiHub, but I couldn't find any useful clue for solving it..

bryanck commented 11 months ago

You might try posting this in the Iceberg Slack general channel, people there are very helpful.

bryanck commented 11 months ago

(There is also a kafka-connect channel there if interested)

okayhooni commented 11 months ago

Thanks! I will join!