Open numberlabs-developers opened 11 months ago
torvalds.dev is analyzing the ticket
After analytically assessing the provided error log and code snippets, especially the HoodieSinkTask
class, I was able to identify that the user is trying to stream data from YugabyteDB to a Hudi Table via Kafka using a Hoodie Sink Connector. However, the user is encountering a Null Pointer Exception during the process.
This NullPointerException is thrown during the execution of writeRecord
method from org.apache.hudi.connect.writers.AbstractConnectWriter
and is propagated upwards in the stack all the way to HoodieSinkTask's put
method, where it is being caught and logged.
Possible root causes of this exception can be:
writeRecord
function.Relevant files and actions related to this issue:
hudi-kafka-connect/src/main/java/org/apache/hudi/connect/HoodieSinkTask.java
: This is where the tasks for writing records are created and managed. You might want to inspect its 'put' and 'processRecords' methods for potential sources of the null values. It includes the code where transaction participants are started and stopped. Check the lifecycle management of these participants, specifically, if they are properly initialized before use.
packaging/bundle-validation/kafka/config-sink.json
: This file contains the configuration for the Hoodie Sink Connector. Ensure that all of your configurations are correct and none of the fields are null or improperly set. Pay particular attention to the values related to the Hoodie table and Kafka configurations including topic and bootstrap servers.
Next Steps:
AbstractConnectWriter.writeRecord
and ConnectTransactionParticipant.processRecords
methods) to understand more about the context when this error occurs.Always remember to test your changes with a varying amount of data and check if the error is resolved.
I am trying to load data from YugabyteDB which is streamed to Kafka and I am using Hoodie Sink connector to sink the data to a Hudi Table and getting following error. [2023-11-19 14:20:22,236] WARN [hudi-yb-test1|task-0] Error received while writing records for transaction 20231119141913105 in partition 0 (org.apache.hudi.connect.transaction.ConnectTransactionParticipant:238) java.lang.NullPointerException at org.apache.hudi.connect.writers.AbstractConnectWriter.writeRecord(AbstractConnectWriter.java:71) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.writeRecords(ConnectTransactionParticipant.java:219) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.processRecords(ConnectTransactionParticipant.java:137) at org.apache.hudi.connect.HoodieSinkTask.put(HoodieSinkTask.java:114) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
To Reproduce
Steps to reproduce the behavior:
My Source Kafka connector config curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "pubnuborders1", "config": {
"connector.class": "io.debezium.connector.yugabytedb.YugabyteDBConnector", "database.hostname": "'$IP'", "database.port": "5433", "tasks.max": "3", "database.master.addresses": "'$IP':7100", "database.user": "yugabyte", "database.password": "yugabyte", "database.dbname": "yugabyte", "database.server.name": "dbserver1", "table.include.list": "public.orders", "database.streamid": "97d215847a8444c3a11ae94ca274665f", "snapshot.mode": "never", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "true", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "true", "key.converter.schema.registry.url": "http://schema-registry:8081/", "value.converter.schema.registry.url": "http://schema-registry:8081/" } }'
Hudi Sink connector config
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8085/connectors/ -d '{ "name": "hudi-yb-test1", "config": { "bootstrap.servers": "localhost:9092", "connector.class": "org.apache.hudi.connect.HoodieSinkConnector", "tasks.max": "1", "topics": "dbserver1.public.orders",
"hoodie.table.name": "dbserver1-public-orders", "hoodie.table.type": "MERGE_ON_READ", "hoodie.base.path": "file:///tmp/hoodie/dbserver1-public-orders", "hoodie.datasource.write.recordkey.field": "commit_time", "hoodie.datasource.write.partitionpath.field": "ts_ms", "hoodie.schemaprovider.class": "org.apache.hudi.schema.SchemaRegistryProvider", "hoodie.deltastreamer.schemaprovider.registry.url": "http://localhost:8081/subjects/dbserver1.public.orders-value/versions/latest", "hoodie.kafka.commit.interval.secs": 60 } }'
Expected behavior
It should store the data in /tmp/hoodie/dbserver1-public-orders
Environment Description
Hudi version : 0.13.1
Spark version : 3.2.3
Hive version : N/A
Hadoop version : N/A
Storage (HDFS/S3/GCS..) :Local
Running on Docker? (yes/no) : No