Closed okayhooni closed 8 months ago
Multiple topic ingestion is supported, either via the topics
or topics.regex
configurations. This is a warning message that can be safely ignored, suppressing the stack trace will require an update to Iceberg so for now you'll need to configure your logging to suppress this, if desired.
I'll investigate alternatives for suppressing the stack trace in the meantime.
@bryanck
I am really appreciate to you for quick answer.
But, the connector I deployed didn't get into READY
states.. (deployed with custom resource based on k8s strimzi operator)
(the only difference between v7
and v8
is mapping multiple topic)
Are there any other errors? The one you posted is something that will be retried. Also, what version are you using?
I couldn't find any other errors..
I use the iceberg-kafka-connect-runtime-hive-0.5.7
version
The avro schemas of multiple topics are different. -> I expected the auto-created table schema had all the fields of multi-topics & inject null value to non-existing field of each topic
Oh.. I found other error.. like below..
2023-10-18 15:40:32,310 ERROR [iceberg-tabular-sink-connector-avro-schema-poc-v8|task-1] WorkerSinkTask{id=iceberg-tabular-sink-connector-avro-schema-poc-v8-1} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: An error occurred converting record, topic: order.beta.streaming.pay-result.avro, partition, 1, offset: 25908 (org.apache.kafka.connect.runtime.WorkerSinkTask) [task-thread-iceberg-tabular-sink-connector-avro-schema-poc-v8-1]
org.apache.kafka.connect.errors.DataException: An error occurred converting record, topic: order.beta.streaming.pay-result.avro, partition, 1, offset: 25908
at io.tabular.iceberg.connect.data.IcebergWriter.write(IcebergWriter.java:73)
at io.tabular.iceberg.connect.channel.Worker.lambda$routeRecordStatically$5(Worker.java:201)
at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4204)
at io.tabular.iceberg.connect.channel.Worker.routeRecordStatically(Worker.java:199)
at io.tabular.iceberg.connect.channel.Worker.save(Worker.java:188)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at io.tabular.iceberg.connect.channel.Worker.save(Worker.java:175)
at io.tabular.iceberg.connect.IcebergSinkTask.put(IcebergSinkTask.java:145)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:583)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:336)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:237)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:206)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:202)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:257)
at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:177)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.Boolean.booleanValue()" because "value" is null
at org.apache.iceberg.parquet.ColumnWriter$1.write(ColumnWriter.java:34)
at org.apache.iceberg.parquet.ColumnWriter$1.write(ColumnWriter.java:31)
at org.apache.iceberg.parquet.ParquetValueWriters$PrimitiveWriter.write(ParquetValueWriters.java:131)
at org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:589)
at org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:589)
at org.apache.iceberg.parquet.ParquetWriter.add(ParquetWriter.java:139)
at org.apache.iceberg.io.DataWriter.write(DataWriter.java:71)
at org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.write(BaseTaskWriter.java:362)
at org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.write(BaseTaskWriter.java:345)
at org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.write(BaseTaskWriter.java:277)
at org.apache.iceberg.io.UnpartitionedWriter.write(UnpartitionedWriter.java:42)
at io.tabular.iceberg.connect.data.IcebergWriter.write(IcebergWriter.java:65)
... 19 more
This is caused by a null value in a required field. Throwing an NPE isn't ideal but that will require an update to Iceberg, I'll bring that up w/ some of the Iceberg folks.
This is caused by a null value in a required field. Throwing an NPE isn't ideal but that will require an update to Iceberg, I'll bring that up w/ some of the Iceberg folks.
Thank you!!!!
I bypassed this issue by creating table beforehand, with STRING field instead of STRUCT field(w/ NOT NULL condition).
THANK YOU @bryanck !!
@bryanck
How about adding behavior.on.null.values
option on this iceberg connector, same as other major sink connectors like S3SinkConnector
or ElasticSearchSinkConnector
..?
I will really appreciate if that option added..!
Thanks for the links, I’ll take a look. One thing that could be a problem with combining Avro schemas is that if a field is marked as required in one schema, then it will be added to the table as required, but only one type of message might have it. We may want an option to always create fields as optional for these cases.
Iceberg table format has optimistic concurrency model on table-write and atomic metadata swap.
However, I expected this connector working well if those source topics are declared on the same connector spec. (=NOT MULTIPLE CONNECTORS WITH SAME TARGET TABLE)
But, the results of the test were different than I expected..
Is there any plan to support multi-topic ingestion to one table with only one connector
[ERROR LOG]