opensearch-project / opensearch-hadoop

Apache License 2.0
30 stars 24 forks source link

[BUG] Unable to Write to OpenSearch 2.16.0 using Spark 3.5 #521

Open amitgenius opened 1 month ago

amitgenius commented 1 month ago

What is the bug?

Using Spark 3.5 Streaming Job reading data from Kafka but while writing to OpenSearch giving following error. Checked _cluster/heath response through same endpoints are working fine. curl -kgu username:'somepassword' http://xxx-opensearch:9200/_cluster/health?pretty { "cluster_name" : "opensearch2x-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 2, "discovered_master" : true, "discovered_cluster_manager" : true, "active_primary_shards" : 21, "active_shards" : 42, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 } Here are my spark configurations:

opensearch.nodes xxx-opensearch

opensearch.port 9200

opensearch.nodes.wan.only true

opensearch.batch.size.bytes 10mb

opensearch.index.auto.create true

opensearch.batch.size.entries 100

opensearch.net.http.auth.pass somepassword

opensearch.net.http.auth.user someusername

opensearch.batch.write.refresh false

"Exception in storing in elasticorg.opensearch.hadoop.OpenSearchHadoopIllegalArgumentException: Cannot detect OpenSearch version - typically this happens if the network/OpenSearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'opensearch.nodes.wan.only'

Spark Code: JavaOpenSearchSpark.saveJsonToOpenSearch(map, "{kafka_topic}" + "-" + date);

Versions: OpenSearch: 2.16.0.0 Connector: opensearch-hadoop-1.2.0.jar Spark: 3.5

How can one reproduce the bug?

Steps to reproduce the behavior.

What is the expected behavior?

should write to OpenSearch successfully

What is your host/environment?

Linux, Deployed OpenSearch on Kubernetes,

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

Add any other context about the problem.

Xtansia commented 1 month ago

There should in theory be a more detailed exception about the specific request that failed following the initial couldn't determine version exception, can you please check the logs?

In general that particular error (assuming you're not using Amazon OpenSearch Serverless) is usually caused by some variety of connection or authentication issue.

amitgenius commented 1 month ago

same configurations are working perfectly with OpenSearch 2.11.1.0 and all versions lower than this. When I am upgrading the OpenSearch to 2.16.0 version it is getting this issue.