tabular-io / iceberg-kafka-connect

Apache License 2.0
203 stars 46 forks source link

Few questions #153

Closed ajantha-bhat closed 11 months ago

ajantha-bhat commented 11 months ago

Some of these things are not clear from the documentation or test cases. Can you please clarify?

  1. Do we support all these converters for source topic(s)? I know that control topic only uses Avro.

    org.apache.kafka.connect.storage.StringConverter
    org.apache.kafka.connect.json.JsonConverter
    org.apache.kafka.connect.converters.ByteArrayConverter
    io.confluent.connect.avro.AvroConverter
    io.confluent.connect.protobuf.ProtobufConverter
    io.confluent.connect.json.JsonSchemaConverter
  2. Schema registry is supported?

  3. autoCreateTable is not creating namespaces by default? Shall I raise a PR to fix it?

  4. Buffering and polling is currently only time based (iceberg.control.commit.interval-ms)? Do we have a plan for message count based threshold?

  5. How to automatically add the ingestion time column? Should use SMT?

  6. Do we support all these SMT? https://docs.confluent.io/platform/current/connect/transforms/overview.html

  7. Schema evolution with auto table creation may not work? Since we need to alter table's schema?

  8. Framework capabilities like error tolerance, dead letter queue , deployment modes are supported with this connector too right?

  9. How does commit retry works? I saw the commit-interval and commit-timeout config. Do we have configurable retries?

  10. Can JMX be used to monitor the conenctor? https://docs.confluent.io/platform/current/connect/monitoring.html#use-jmx-to-monitor-kconnect

bryanck commented 11 months ago
  1. There are no specific limitations on which converters can be used by the sink, any of those should work
  2. Yes
  3. Feel free to open a PR if it is small, we're trying to limit disruptive changes during the Iceberg submission process
  4. There are no plans currently
  5. Yes an SMT makes sense here
  6. There are no specific limitations on what SMTs can be used so any of those should work
  7. If I'm understand the question right, schema evolution should work with auto table creation
  8. There are no limitations around these so all should work the same way as with other sinks
  9. Commit retries can be set via table properties, i.e. commit.retry.num-retries
  10. JMX should work as with any other connector, this sink doesn't expose any custom metrics yet
ajantha-bhat commented 11 months ago

Thanks.