numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT]Getting Error received while writing records for transaction 20231119141913105 in partition 0 (org.apache.hudi.connect.transaction.ConnectTransactionParticipant:238) java.lang.NullPointerException error #21

Open numberlabs-developers opened 11 months ago

numberlabs-developers commented 11 months ago

I am trying to load data from YugabyteDB which is streamed to Kafka and I am using Hoodie Sink connector to sink the data to a Hudi Table and getting following error. [2023-11-19 14:20:22,236] WARN [hudi-yb-test1|task-0] Error received while writing records for transaction 20231119141913105 in partition 0 (org.apache.hudi.connect.transaction.ConnectTransactionParticipant:238) java.lang.NullPointerException at org.apache.hudi.connect.writers.AbstractConnectWriter.writeRecord(AbstractConnectWriter.java:71) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.writeRecords(ConnectTransactionParticipant.java:219) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.processRecords(ConnectTransactionParticipant.java:137) at org.apache.hudi.connect.HoodieSinkTask.put(HoodieSinkTask.java:114) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

To Reproduce

Steps to reproduce the behavior:

My Source Kafka connector config curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "pubnuborders1", "config": {

"connector.class": "io.debezium.connector.yugabytedb.YugabyteDBConnector", "database.hostname": "'$IP'", "database.port": "5433", "tasks.max": "3", "database.master.addresses": "'$IP':7100", "database.user": "yugabyte", "database.password": "yugabyte", "database.dbname": "yugabyte", "database.server.name": "dbserver1", "table.include.list": "public.orders", "database.streamid": "97d215847a8444c3a11ae94ca274665f", "snapshot.mode": "never", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "true", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "true", "key.converter.schema.registry.url": "http://schema-registry:8081/", "value.converter.schema.registry.url": "http://schema-registry:8081/" } }'

Hudi Sink connector config

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8085/connectors/ -d '{ "name": "hudi-yb-test1", "config": { "bootstrap.servers": "localhost:9092", "connector.class": "org.apache.hudi.connect.HoodieSinkConnector", "tasks.max": "1", "topics": "dbserver1.public.orders",

"hoodie.table.name": "dbserver1-public-orders", "hoodie.table.type": "MERGE_ON_READ", "hoodie.base.path": "file:///tmp/hoodie/dbserver1-public-orders", "hoodie.datasource.write.recordkey.field": "commit_time", "hoodie.datasource.write.partitionpath.field": "ts_ms", "hoodie.schemaprovider.class": "org.apache.hudi.schema.SchemaRegistryProvider", "hoodie.deltastreamer.schemaprovider.registry.url": "http://localhost:8081/subjects/dbserver1.public.orders-value/versions/latest", "hoodie.kafka.commit.interval.secs": 60 } }'

Expected behavior

It should store the data in /tmp/hoodie/dbserver1-public-orders

Environment Description

Hudi version : 0.13.1

Spark version : 3.2.3

Hive version : N/A

Hadoop version : N/A

Storage (HDFS/S3/GCS..) :Local

Running on Docker? (yes/no) : No

torvalds-dev-testbot[bot] commented 11 months ago

torvalds.dev is analyzing the ticket

torvalds-dev-testbot[bot] commented 11 months ago

The error you're encountering is a Null Pointer Exception in the HoodieSinkTask. This exception is typically thrown when an application attempts to use an object reference that has the null value. Looking at the error stack trace, it seems like the exception happens in the writeRecords method of ConnectTransactionParticipant class.

The relevant files you should look into for the issue are:

  1. HoodieSinkTask.java file: This is where your data is sent to Hudi. Within this file, you may want to check into the put method (line 114) which is part of the error stack trace. This suggests that a NullPointerException may have been thrown. Ensure all the variables and objects used within this method are correctly initialized and aren't null.

  2. kafka/config-sink.json: This is the configuration file for the Kafka Hudi sink connector. Ensure all the configuration settings and properties are correctly set according to your actual environment setup and use case.

You may also want to add additional logging to your implementation or debug the processing step by step to identify what could be null when it shouldn't be. Now let's move on to possible solutions and actions you can take:

Solution 1: One of the most common case of NullPointerException could be when you are passing a null parameter to a method, check the parameters you are passing in the AbstractConnectWriter.writeRecord, the ConnectTransactionParticipant.writeRecords and the ConnectTransactionParticipant.processRecords methods.

Solution 2: Another common cause for NullPointerException could be instantiating objects, check where you are instantiating your objects in your java code and ensure they are not null.

Solution 3: Ensure that all of your objects were successfully created. If any of them failed to create then it would remain null.

As this is a run-time exception, another way to avoid it would be to apply proper exception handling by checking for null references before calling a method or accessing a variable. Adding null checks before attempting to execute methods can help mitigate the occurrence of these exceptions.

Finally, as you are using Hudi version 0.13.1, it might also be beneficial to check if there are open issues on NullPointerExceptions with similar setup on the Hudi Github repository or on their user group/mailing list. If it's a known issue, there might be specific guidance or a possible bug fix.