opensearch-project / opensearch-py

Python Client for OpenSearch
Apache License 2.0
337 stars 168 forks source link

[BUG] opensearch-py doesn't support chunked encoding with compression enabled with sigv4 AuthN/AuthZ #176

Open kumjiten opened 2 years ago

kumjiten commented 2 years ago

What is the bug? Opensearch python client using content length header and does not support chunked with compression enabled.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. create openSearch domain in (AWS) which support IAM based AuthN/AuthZ
  2. send signed request to opensearch cluster using python rest client(
  3. create rest-client in java with compression enabled
    search = OpenSearch(
    hosts = [{'host': host, 'port': 443}],
    http_auth = awsauth,
    use_ssl = True,
    verify_certs = True,
    http_compress = True, # enables gzip compression for request bodies <---------
    connection_class = RequestsHttpConnection
  4. it's sending content-length header by default
    PUT https://xxxxxxxx:443/movies/_doc/1?refresh=true
    content-type: application/json
    user-agent: opensearch-py/2.0.0 (Python 3.8.9)
    accept-encoding: gzip,deflate
    content-encoding: gzip
    Content-Length: 78 <--------------
    x-amz-date: 20220625T131237Z
    x-amz-content-sha256: 70ced8b1d2572d31b43dcf4ad0c58867d4f23bbbdb3bb24d7cb0059a87465816
    Authorization: AWS4-HMAC-SHA256 Credential=AKIAV7BDGZUCRKUTEG7B/20220625/eu-west-1/es/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=5e8d252a9bd11728ec2e3305a74f2cc2eeddb29e69ae102cc815ed90bcb27d34

repro code:

from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3
import pdb

host = '' # e.g.
region = 'eu-west-1' # e.g. us-west-1
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Create the client.
search = OpenSearch(
    hosts = [{'host': host, 'port': 443}],
    http_auth = awsauth,
    use_ssl = True,
    verify_certs = True,
    http_compress = True, # enables gzip compression for request bodies
    connection_class = RequestsHttpConnection

document = {
  "title": "Moneyball",
  "director": "Bennett Miller",
  "year": "2011"

# Send the request.
print(search.index(index='movies', id='1', body=document, refresh=True))

causing this call to pass, what if content is too large and wanted to use chunked with compression.

What is the expected behavior? It should support chunked with sigv4 to work with large payload.

similar issue:

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context?

harshavamsi commented 1 year ago

@jiten1551 Are you saying that this is a bug or a feature that you might want. The default RequestsHttpConnection does not support chunked encoding. It would have to be a new flag in the connection class to allow for that. But just to separate things, SigV4 works with compressed requests using http_compress, what you're asking for is compressing and chunking, which could be a new feature?

dblock commented 1 year ago

I think it's a feature request: enable chunked transfer encoding (and ensure it works with Sigv4). A similar problem in the java client was that setting compression would also automatically turn on chunked transfer encoding, which would work, except for Sigv4.

fabioasdias commented 1 year ago

python requests does chunked automatically if a generator is passed. In fact, one could arguably bypass the api straight into the connector.perform_request with a generator, as long as the http_compress is disabled (and then the gzip.compress doesn't run) and the input argument is just happily passed along to requests...