opensearch-project / opensearch-py

Python Client for OpenSearch
https://opensearch.org/docs/latest/clients/python/
Apache License 2.0
338 stars 170 forks source link

[BUG] When calling Opensearch Bulk API, host and port are called in a non-combined form. #824

Open fast-coding opened 1 week ago

fast-coding commented 1 week ago

What is the bug?

As stated in the official document, errors occur in host and port when creating the Client. https://opensearch.org/docs/latest/clients/python-low-level/

client = OpenSearch(
    hosts = [{'host': host, 'port': 443}],

How can one reproduce the bug?

import boto3
from requests_aws4auth import AWS4Auth
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth

host = '<opensearch_domain>/_bulk'
region = 'ap-northeast-2'  
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWSV4SignerAuth(credentials, region, service)

index = 'movies'
datatype = '_doc'

client = OpenSearch(
    # hosts=[{"host": host, "port": 9200}],
    hosts=[host+"9200"],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class = RequestsHttpConnection,
    pool_maxsize = 20
)
movies = '{ "index" : { "_index" : "movies", "_id" : "2" } } \n { "title" : "Interstellar", "director" : "Christopher Nolan", "year" : "2014"} \n { "create" : { "_index" : "movies", "_id" : "3" } } \n { "title" : "Star Trek Beyond", "director" : "Justin Lin", "year" : "2015"} \n { "update" : {"_id" : "3", "_index" : "movies" } } \n { "doc" : {"year" : "2016"} }'

client.bulk(movies)

Error

requests.exceptions.InvalidURL: Failed to parse: https://[<opensearch_domain>/_bulk]:9200/_bulk

What is the expected behavior?

I re-set the host in the following format and it works normally.

client = OpenSearch(
    hosts=[host+"9200"],
    http_auth=awsauth,

Do you have any additional context?

In summary, it seems that when i create the Opensearch client, i need to create the host and port as a string and then put it in the list. However, the official document says that i need to save it in dictionary form in the list. Please check this part to fix the bug.

dblock commented 1 week ago

I believe host should just be the host, not <opensearch_domain>/_bulk, aka just <opensearch_domain>. I checked the docs but I am not seeing anything that has implied otherwise? Help me find what needs changing? Or contribute to https://github.com/opensearch-project/documentation-website directly?

fast-coding commented 6 days ago

Yes, thank you for your reply. As you said, it seems that you can subtract _bulk from the host. However, the code below needs to be modified.

An error occurs when executing the code below.

hosts=[{"host": host, "port": 9200}],
Failed to parse: https://[https://.....ap-northeast-2.es.amazonaws.com/]:9200/_bulk

If you put hosts in the list as a string, it works normally.

hosts=[host+"9200"],
dblock commented 5 days ago

I think this is by design.

An error occurs when executing the code below. hosts=[{"host": host, "port": 9200}],

this produces

hosts = [{"host":"https://.....ap-northeast-2.es.amazonaws.com/_bulk", "port":9200}]

which is incorrect, this is not a host, this is a URL. The error is expected.

If you put hosts in the list as a string, it works normally. hosts=[host+"9200"],

produces

hosts=["https://.....ap-northeast-2.es.amazonaws.com/_bulk9200"]

There's code in the client that allows you to specify a URL in hosts that contain both a host and a port. This translates to host = 'ap-northeast-2.es.amazonaws.com' and port = 443 (not 9200, note the missing : as it adds it to _bulk9200). The path is just dropped when the URL is parsed.

Is there still a scenario that doesn't behave as you'd expect?