opensearch-project / opensearch-py

Python Client for OpenSearch
https://opensearch.org/docs/latest/clients/python/
Apache License 2.0
338 stars 170 forks source link

[BUG] http_auth not embedded in base_url for Basic Auth #487

Closed wwwdavid34 closed 1 year ago

wwwdavid34 commented 1 year ago

In connection/http_requests.py, if user had set http_auth for Basic Auth, the credentials are not baked in the self.base_url, hence resulting in status 401 Unauthorized error.

Here is the snippet of the code in question.

if http_auth is not None:
    if isinstance(http_auth, (tuple, list)):
        http_auth = tuple(http_auth)
    elif isinstance(http_auth, string_types):
        http_auth = tuple(http_auth.split(":", 1))
    self.session.auth = http_auth

self.base_url = "%s%s" % (
    self.host,
    self.url_prefix,
 )

The commands below should be added to amend self.base_url if http_auth is set, so the Basic Auth credentials can be included in base_url.

if http_auth is not None:
    index = self.base_url.find('://')
    self.base_url = self.base_url[:index + 3] + ':'.join(self.session.auth) + "@" + self.base_url[index + 3:]
wwwdavid34 commented 1 year ago

After more careful review, this is not a bug. The basic auth credentials are carried in request.session.auth. The problem I encountered might be related to security settings described in this post - https://repost.aws/questions/QUMnDBOQn0STCTSGPdg9D4Og/401-authenticationexception-when-trying-to-call-api-endpoints-of-opensearch-serverless-collection-from-within-a-vpc

saimedhi commented 1 year ago

Hello @wwwdavid34, Closing this issue for now. If you believe the bug persists, kindly reopen and consider contributing through a PR. Thank you!

wwwdavid34 commented 1 year ago

@saimedhi Sorry to reopen this bug report. After tracking down my issue, it appears that it is how basd64 encoded auth string is generated that is causing the problem.

In opensearchpy, when Basic auth is used, http_urllib3.py encodes the http_auth string by calling urllib3.make_headers. The encoded basic_auth is then placed in headers.

There is a subtle difference when making the base64 encoded string, with or without the trailing newline character (\n). It looks like the Opensearch instance prefers the auth string to HAVE the newline character included when accessing with Basic Auth.

Here's an example run to get the response from _cluster/health endpoint. The environment is Python 3.11.4 Opensearchpy 2.3.1 Ubuntu 20.04.1

$ python test_base64.py
urllib3 encoded credential header: {'authorization': 'Basic cmVwb29wczprMVMkWlEyfihcKg=='}
401
base64 encoded credential header with newline: {'Authorization': 'Basic cmVwb29wczozazFTJFpRMn4oXCo='}
200
# Base64 encoded credential string without newline character
$ echo -n 'repoops:k1S$ZQ2~(\*'|base64
cmVwb29wczprMVMkWlEyfihcKg==
# Base64 encoded credential string with newline chracter
$ echo 'repoops:k1S$ZQ2~(\*'|base64
cmVwb29wczprMVMkWlEyfihcKgo=

I am not sure if this is a case which should be taken care at AWS Opensearch instance side or Python client side. Any suggestions?

dblock commented 1 year ago

@wwwdavid34, do you have a simple way to reproduce this problem?

I'd like to see where the newline character coming from, Is it part of your username/password?

wwwdavid34 commented 1 year ago

@dblock It turned out to be a missing character in the password string when passing it between developers. Please disregard my comment. Much appreciated for your attention and sorry for the confusion. 👍