numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT]Getting Error received while writing records for transaction 20231119141913105 in partition 0 (org.apache.hudi.connect.transaction.ConnectTransactionParticipant:238) java.lang.NullPointerException error #22

Open numberlabs-developers opened 11 months ago

numberlabs-developers commented 11 months ago

I am trying to load data from YugabyteDB which is streamed to Kafka and I am using Hoodie Sink connector to sink the data to a Hudi Table and getting following error. [2023-11-19 14:20:22,236] WARN [hudi-yb-test1|task-0] Error received while writing records for transaction 20231119141913105 in partition 0 (org.apache.hudi.connect.transaction.ConnectTransactionParticipant:238) java.lang.NullPointerException at org.apache.hudi.connect.writers.AbstractConnectWriter.writeRecord(AbstractConnectWriter.java:71) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.writeRecords(ConnectTransactionParticipant.java:219) at org.apache.hudi.connect.transaction.ConnectTransactionParticipant.processRecords(ConnectTransactionParticipant.java:137) at org.apache.hudi.connect.HoodieSinkTask.put(HoodieSinkTask.java:114) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

To Reproduce

Steps to reproduce the behavior:

My Source Kafka connector config curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "pubnuborders1", "config": {

"connector.class": "io.debezium.connector.yugabytedb.YugabyteDBConnector", "database.hostname": "'$IP'", "database.port": "5433", "tasks.max": "3", "database.master.addresses": "'$IP':7100", "database.user": "yugabyte", "database.password": "yugabyte", "database.dbname": "yugabyte", "database.server.name": "dbserver1", "table.include.list": "public.orders", "database.streamid": "97d215847a8444c3a11ae94ca274665f", "snapshot.mode": "never", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "true", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "true", "key.converter.schema.registry.url": "http://schema-registry:8081/", "value.converter.schema.registry.url": "http://schema-registry:8081/" } }'

Hudi Sink connector config

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8085/connectors/ -d '{ "name": "hudi-yb-test1", "config": { "bootstrap.servers": "localhost:9092", "connector.class": "org.apache.hudi.connect.HoodieSinkConnector", "tasks.max": "1", "topics": "dbserver1.public.orders",

"hoodie.table.name": "dbserver1-public-orders", "hoodie.table.type": "MERGE_ON_READ", "hoodie.base.path": "file:///tmp/hoodie/dbserver1-public-orders", "hoodie.datasource.write.recordkey.field": "commit_time", "hoodie.datasource.write.partitionpath.field": "ts_ms", "hoodie.schemaprovider.class": "org.apache.hudi.schema.SchemaRegistryProvider", "hoodie.deltastreamer.schemaprovider.registry.url": "http://localhost:8081/subjects/dbserver1.public.orders-value/versions/latest", "hoodie.kafka.commit.interval.secs": 60 } }'

Expected behavior

It should store the data in /tmp/hoodie/dbserver1-public-orders

Environment Description

Hudi version : 0.13.1

Spark version : 3.2.3

Hive version : N/A

Hadoop version : N/A

Storage (HDFS/S3/GCS..) :Local

Running on Docker? (yes/no) : No

torvalds-dev-testbot[bot] commented 11 months ago

torvalds.dev is analyzing the ticket

torvalds-dev-testbot[bot] commented 11 months ago

After analytically assessing the provided error log and code snippets, especially the HoodieSinkTask class, I was able to identify that the user is trying to stream data from YugabyteDB to a Hudi Table via Kafka using a Hoodie Sink Connector. However, the user is encountering a Null Pointer Exception during the process.

This NullPointerException is thrown during the execution of writeRecord method from org.apache.hudi.connect.writers.AbstractConnectWriter and is propagated upwards in the stack all the way to HoodieSinkTask's put method, where it is being caught and logged.

Possible root causes of this exception can be:

  1. The input data might be null or contain null values which may not be handled correctly in the writeRecord function.
  2. There could be potential issues with the Kafka topic or partition, which might be returning a null value leading to this exception.
  3. Another possibility is the transaction participant might not have been properly instantiated for the said partition leading to this failure.

Relevant files and actions related to this issue:

  1. hudi-kafka-connect/src/main/java/org/apache/hudi/connect/HoodieSinkTask.java: This is where the tasks for writing records are created and managed. You might want to inspect its 'put' and 'processRecords' methods for potential sources of the null values. It includes the code where transaction participants are started and stopped. Check the lifecycle management of these participants, specifically, if they are properly initialized before use.

  2. packaging/bundle-validation/kafka/config-sink.json: This file contains the configuration for the Hoodie Sink Connector. Ensure that all of your configurations are correct and none of the fields are null or improperly set. Pay particular attention to the values related to the Hoodie table and Kafka configurations including topic and bootstrap servers.

Next Steps:

  1. Inspect your data that is being streamed from YugabyteDB and verify that it does not contain null values in places where the code expects data.
  2. Make sure your Kafka topic or partition is properly set up and accessible through your connector.
  3. Double-check your Hudi Sink Connector configuration to ensure that each property is correctly set and non-null.
  4. You should consider adding more logging to your code (especially around AbstractConnectWriter.writeRecord and ConnectTransactionParticipant.processRecords methods) to understand more about the context when this error occurs.
  5. If none of the above resolves the issue, try reaching out to the Hudi support community to get help on your specific setup.

Always remember to test your changes with a varying amount of data and check if the error is resolved.