Closed raviagarwal526 closed 9 months ago
torvalds.dev is analyzing the ticket
The issue you're facing is a java.lang.NullPointerException
error while trying to load data from YugabyteDB, which is streamed to Kafka, and using Hoodie Sink connector to sink the data to a Hudi Table.
The error is thrown at org.apache.hudi.connect.writers.AbstractConnectWriter.writeRecord(AbstractConnectWriter.java:71)
. This suggests that there might be a null value being passed to the writeRecord
method of the AbstractConnectWriter
class in the Hudi Connect library.
Here's a plan to understand and address the issue:
Check the data: The first step is to ensure that the data being passed from YugabyteDB to Kafka doesn't contain null values that might be causing the issue. You can do this by inspecting the data in YugabyteDB and the data being streamed to Kafka.
Check the configuration: The configuration for the Hudi Sink connector and the Source Kafka connector should be checked to ensure that they are correct. The error might be due to a misconfiguration. The relevant files for this are packaging/bundle-validation/kafka/config-sink.json
and the configuration provided in the issue description.
Check the code: The HoodieSinkTask.java
file is relevant to the issue as it contains the put
method which calls the processRecords
method of the TransactionParticipant
class. This, in turn, calls the writeRecords
method of the ConnectTransactionParticipant
class which finally calls the writeRecord
method of the AbstractConnectWriter
class where the error is thrown. You should check these methods to ensure they are handling null values correctly.
Update or patch the library: If the issue is with the Hudi Connect library, you might need to update the library to a newer version if available. If the issue persists in the latest version, you might need to patch the library to handle null values correctly.
Reach out to the community: If you're unable to resolve the issue, you can reach out to the Hudi and Kafka communities for help. You can provide them with the details of the issue, the steps to reproduce it, and the error logs. They might be able to provide you with a solution or a workaround.
Demo comment
Demo Comment
Demo comment
Thanks, that worked.
It seems correct to me.
Your suggesstion worked.
Check comment.
I am trying to load data from YugabyteDB which is streamed to Kafka and I am using Hoodie Sink connector to sink the data to a Hudi Table and getting following error. [2023-11-19 14:20:22,236] WARN [hudi-yb-test1|task-0] Error received while writing records for transaction 20231119141913105 in partition 0 (org.apache.hudi.connect.transaction.ConnectTransactionParticipant:238) java.lang.NullPointerException at org.apache.hudi.connect.writers.AbstractConnectWriter.writeRecord(AbstractConnectWriter.java:71) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.writeRecords(ConnectTransactionParticipant.java:219) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.processRecords(ConnectTransactionParticipant.java:137) at org.apache.hudi.connect.HoodieSinkTask.put(HoodieSinkTask.java:114) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
To Reproduce
Steps to reproduce the behavior:
My Source Kafka connector config curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "pubnuborders1", "config": {
"connector.class": "io.debezium.connector.yugabytedb.YugabyteDBConnector", "database.hostname": "'$IP'", "database.port": "5433", "tasks.max": "3", "database.master.addresses": "'$IP':7100", "database.user": "yugabyte", "database.password": "yugabyte", "database.dbname": "yugabyte", "database.server.name": "dbserver1", "table.include.list": "public.orders", "database.streamid": "97d215847a8444c3a11ae94ca274665f", "snapshot.mode": "never", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "true", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "true", "key.converter.schema.registry.url": "http://schema-registry:8081/", "value.converter.schema.registry.url": "http://schema-registry:8081/" } }'
Hudi Sink connector config
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8085/connectors/ -d '{ "name": "hudi-yb-test1", "config": { "bootstrap.servers": "localhost:9092", "connector.class": "org.apache.hudi.connect.HoodieSinkConnector", "tasks.max": "1", "topics": "dbserver1.public.orders",
"hoodie.table.name": "dbserver1-public-orders", "hoodie.table.type": "MERGE_ON_READ", "hoodie.base.path": "file:///tmp/hoodie/dbserver1-public-orders", "hoodie.datasource.write.recordkey.field": "commit_time", "hoodie.datasource.write.partitionpath.field": "ts_ms", "hoodie.schemaprovider.class": "org.apache.hudi.schema.SchemaRegistryProvider", "hoodie.deltastreamer.schemaprovider.registry.url": "http://localhost:8081/subjects/dbserver1.public.orders-value/versions/latest", "hoodie.kafka.commit.interval.secs": 60 } }'
Expected behavior
It should store the data in /tmp/hoodie/dbserver1-public-orders
Environment Description
Hudi version : 0.13.1
Spark version : 3.2.3
Hive version : N/A
Hadoop version : N/A
Storage (HDFS/S3/GCS..) :Local
Running on Docker? (yes/no) : No