opensearch-project / opensearch-py

Python Client for OpenSearch
https://opensearch.org/docs/latest/clients/python/
Apache License 2.0
361 stars 180 forks source link

[BUG] Bug with `AWS` if `id` has special characters #833

Open Danipulok opened 1 month ago

Danipulok commented 1 month ago

What is the bug?

AWS + special characters in 'id' result in:

opensearchpy.exceptions.AuthorizationException: AuthorizationException(403, '{"message":"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\\n\\nThe Canonical String for this request should have been\\n\'GET\\n/index-name/_doc/cool_bot%21-f9aa36ea-b8a5-4890-957e-29f000beec86%40localhost\\n\\nhost:search.us-west-2.es.amazonaws.com\\nx-amz-date:20241017T115545Z\\n\\nhost;x-amz-date\\ne3b0c4429\'\\n\\nThe String-to-Sign should have been\\n\'AWS4-HMAC-SHA256\\n20241017T115545Z\\n20241017/us-west-2/es/aws4_request\\n262223abc008b0d31e74a714950\'\\n"}')

How can one reproduce the bug?

import boto3
from opensearchpy import AWSV4SignerAsyncAuth, AsyncHttpConnection, AsyncOpenSearch

OPENSEARCH_URL = "https://search.us-west-2.es.amazonaws.com"
OPENSEARCH_AWS_ACCESS_KEY_ID = "..."
OPENSEARCH_AWS_SECRET_ACCESS_KEY = "..."
OPENSEARCH_AWS_REGION = "us-west-2"

kwargs = {}
credentials = boto3.Session(
    aws_access_key_id=OPENSEARCH_AWS_ACCESS_KEY_ID,
    aws_secret_access_key=OPENSEARCH_AWS_SECRET_ACCESS_KEY,
).get_credentials()
auth = AWSV4SignerAsyncAuth(credentials, OPENSEARCH_AWS_REGION)
kwargs["http_auth"] = auth
kwargs["connection_class"] = AsyncHttpConnection

async def main() -> None:
    opensearch_client = AsyncOpenSearch(
        OPENSEARCH_URL,
        **kwargs,
    )

    REAL_ID = "cool_bot@localhost"
    await opensearch_client.get(index="index_name", id=REAL_ID)

    await opensearch_client.close()

if __name__ == "__main__":
    asyncio.run(main())

What is the expected behavior?

When using localhost openseach run via docker everything's okay. So I guess there's a problem with AWSV4SignerAsyncAuth somewhere

What is your host/environment?

Win10 Python 3.12.3 opensearch-py==2.7.0 boto3==1.35.2

Do you have any additional context?

Maybe it's a bug in boto3.Session ? If yes, please tell me, I will open the issue there. And I consider this as an issue at all since with non-aws and localhost everything's okay. With all other ID like "foo" everything's okay with the same code, so it's not a credentials problem

dblock commented 1 month ago

Do other requests against an AWS instance work, aka if the ID doesn't have a special character in it?

Let's narrow it down.

  1. Try curl or awscurl and make sure that works, some examples in https://code.dblock.org/2022/07/11/making-sigv4-authenticated-requests-to-managed-opensearch.html.
  2. Try a known working Python demo in https://github.com/dblock/opensearch-python-client-demo. Does modifying it to insert one of these IDs reproduce the issue?
  3. Let's see what the difference between your code and (2) is.
fabiopedrosa commented 3 weeks ago

I'm seeing the same bug happen, only when using AWSV4SignerAsyncAuth

Danipulok commented 3 weeks ago

@fabiopedrosa thanks a lot for the comment, could not find a time for the asked code. So when "AsyncHttpConnection" is not used everything's good?

fabiopedrosa commented 3 weeks ago

I narrowed down the bug to this line in helpers\asyncsigner.py, the URL needs to be signed before encoding its URL-entities:

from urllib.parse import unquote
aws_request = AWSRequest(
    method=method,
    url=unquote(url),
    data=body,
)
fabiopedrosa commented 3 weeks ago

I've created a PR for this issue to be fixed

dblock commented 2 weeks ago

[Catch All Triage - 1, 2]

dblock commented 4 days ago

@nathaliellenaa want to take this?

nathaliellenaa commented 4 days ago

Sure, I can take this one!