numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT]Getting Error received while writing records for transaction 20231119141913105 in partition 0 (org.apache.hudi.connect.transaction.ConnectTransactionParticipant:238) java.lang.NullPointerException error #121

Closed raviagarwal526 closed 9 months ago

raviagarwal526 commented 9 months ago

I am trying to load data from YugabyteDB which is streamed to Kafka and I am using Hoodie Sink connector to sink the data to a Hudi Table and getting following error. [2023-11-19 14:20:22,236] WARN [hudi-yb-test1|task-0] Error received while writing records for transaction 20231119141913105 in partition 0 (org.apache.hudi.connect.transaction.ConnectTransactionParticipant:238) java.lang.NullPointerException at org.apache.hudi.connect.writers.AbstractConnectWriter.writeRecord(AbstractConnectWriter.java:71) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.writeRecords(ConnectTransactionParticipant.java:219) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.processRecords(ConnectTransactionParticipant.java:137) at org.apache.hudi.connect.HoodieSinkTask.put(HoodieSinkTask.java:114) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

To Reproduce

Steps to reproduce the behavior:

My Source Kafka connector config curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "pubnuborders1", "config": {

"connector.class": "io.debezium.connector.yugabytedb.YugabyteDBConnector", "database.hostname": "'$IP'", "database.port": "5433", "tasks.max": "3", "database.master.addresses": "'$IP':7100", "database.user": "yugabyte", "database.password": "yugabyte", "database.dbname": "yugabyte", "database.server.name": "dbserver1", "table.include.list": "public.orders", "database.streamid": "97d215847a8444c3a11ae94ca274665f", "snapshot.mode": "never", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "true", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "true", "key.converter.schema.registry.url": "http://schema-registry:8081/", "value.converter.schema.registry.url": "http://schema-registry:8081/" } }'

Hudi Sink connector config

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8085/connectors/ -d '{ "name": "hudi-yb-test1", "config": { "bootstrap.servers": "localhost:9092", "connector.class": "org.apache.hudi.connect.HoodieSinkConnector", "tasks.max": "1", "topics": "dbserver1.public.orders",

"hoodie.table.name": "dbserver1-public-orders", "hoodie.table.type": "MERGE_ON_READ", "hoodie.base.path": "file:///tmp/hoodie/dbserver1-public-orders", "hoodie.datasource.write.recordkey.field": "commit_time", "hoodie.datasource.write.partitionpath.field": "ts_ms", "hoodie.schemaprovider.class": "org.apache.hudi.schema.SchemaRegistryProvider", "hoodie.deltastreamer.schemaprovider.registry.url": "http://localhost:8081/subjects/dbserver1.public.orders-value/versions/latest", "hoodie.kafka.commit.interval.secs": 60 } }'

Expected behavior

It should store the data in /tmp/hoodie/dbserver1-public-orders

Environment Description

Hudi version : 0.13.1

Spark version : 3.2.3

Hive version : N/A

Hadoop version : N/A

Storage (HDFS/S3/GCS..) :Local

Running on Docker? (yes/no) : No

torvalds-dev-testbot[bot] commented 9 months ago

torvalds.dev is analyzing the ticket

torvalds-dev-testbot[bot] commented 9 months ago

The issue you're facing is a java.lang.NullPointerException error while trying to load data from YugabyteDB, which is streamed to Kafka, and using Hoodie Sink connector to sink the data to a Hudi Table.

The error is thrown at org.apache.hudi.connect.writers.AbstractConnectWriter.writeRecord(AbstractConnectWriter.java:71). This suggests that there might be a null value being passed to the writeRecord method of the AbstractConnectWriter class in the Hudi Connect library.

Here's a plan to understand and address the issue:

  1. Check the data: The first step is to ensure that the data being passed from YugabyteDB to Kafka doesn't contain null values that might be causing the issue. You can do this by inspecting the data in YugabyteDB and the data being streamed to Kafka.

  2. Check the configuration: The configuration for the Hudi Sink connector and the Source Kafka connector should be checked to ensure that they are correct. The error might be due to a misconfiguration. The relevant files for this are packaging/bundle-validation/kafka/config-sink.json and the configuration provided in the issue description.

  3. Check the code: The HoodieSinkTask.java file is relevant to the issue as it contains the put method which calls the processRecords method of the TransactionParticipant class. This, in turn, calls the writeRecords method of the ConnectTransactionParticipant class which finally calls the writeRecord method of the AbstractConnectWriter class where the error is thrown. You should check these methods to ensure they are handling null values correctly.

  4. Update or patch the library: If the issue is with the Hudi Connect library, you might need to update the library to a newer version if available. If the issue persists in the latest version, you might need to patch the library to handle null values correctly.

  5. Reach out to the community: If you're unable to resolve the issue, you can reach out to the Hudi and Kafka communities for help. You can provide them with the details of the issue, the steps to reproduce it, and the error logs. They might be able to provide you with a solution or a workaround.

numberlabs-developers commented 9 months ago

Demo comment

raviagarwal526 commented 9 months ago

Demo Comment

raviagarwal526 commented 9 months ago

Demo comment

raviagarwal526 commented 9 months ago

Thanks, that worked.

raviagarwal526 commented 9 months ago

It seems correct to me.

raviagarwal526 commented 9 months ago

Your suggesstion worked.

raviagarwal526 commented 9 months ago

Check comment.