tabular-io / iceberg-kafka-connect

Apache License 2.0
169 stars 31 forks source link

Unwanted SLASH before table name when using HIVE Catalog with auto-create config set to TRUE. #241

Closed ArkaSarkar19 closed 2 months ago

ArkaSarkar19 commented 2 months ago

We found the issue during autocreation of iceberg tables using HIVE catalog.

In the config we are passing the warehouse location as s3://bucket/table but in the table metastore we can see that the location is being stored as s3://bucket//table/metadata (example : s3://[bucket_name]//iceberg_sink_test_11/ where as it should be just as s3://[bucket_name]/iceberg_sink_test_11/ ).

Hence when we checked, we were able to find all the files in the / folder in s3.

Can you tell us what should we use so that we don't see all the files being populated in the / folder, this causes an issue as we are not able to query the table with the files at the wrong location.

This is the current connector config we are using.

{
    "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector",
    "errors.log.include.messages": "true",
    "consumer.override.bootstrap.servers": "[redacted]",
    "tasks.max": "1",
    "topics": "topic",
    "iceberg.control.commit.interval-ms": "60000",
    "iceberg.control.topic": "topic_2",
    "value.converter.value.subject.name.strategy": "io.confluent.kafka.serializers.subject.TopicRecordNameStrategy",
    "value.converter.schema.registry.url": [redacted],
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "key.converter.schema.registry.url": [redacted],
    "iceberg.tables": "test.iceberg_sink_test_17",
    "name": "iceberg_sink_connector_t",
    "errors.log.enable": "true",
    "iceberg.catalog": "spark_catalog",
    "iceberg.catalog.type": "hive",
    "iceberg.catalog.uri": "[redacted]",
    "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
    "iceberg.catalog.client.region": "us-east-1",
    "iceberg.catalog.s3.region": "us-east-1",
    "iceberg.catalog.s3.sse.key": "AES256",
    "iceberg.catalog.s3.sse.type": "s3",
    "iceberg.catalog.warehouse": "s3://bucket_name",
    "iceberg.tables.auto-create-enabled": "true",
    "iceberg.tables.evolve-schema-enabled":"true",
    "iceberg.catalog.s3.path-style-access" : "true"
}
fqtab commented 2 months ago

Let's not start a new issue for this please. There is already valuable context in https://github.com/tabular-io/iceberg-kafka-connect/issues/237 so let's continue the conversation there.