tedder / requests-aws4auth

Amazon Web Services version 4 authentication for the Python Requests module
MIT License
179 stars 63 forks source link

UnicodeEncodeError when passing a non-ascii string in "data" #29

Open jamshid opened 7 years ago

jamshid commented 7 years ago

Sending a non-ascii request body using Python 2.7 fails when using requests-aws4auth. I thought it was a general requests bug at first (https://github.com/kennethreitz/requests/issues/3875) but it only happens with requests-aws4auth. I'm seeing this on Python 2.7.5 on centos 7.2 and macOS.

After some debugging, it seems to be triggered by string literals being forced to "unicode" in /usr/lib/python2.7/site-packages/requests_aws4auth/aws4auth.py.

from __future__ import unicode_literals

FIX/WORKAROUND: comment out that line.


The problem is requests doesn't seem to expect the HTTP request headers to contain unicode strings. Python 2.7 "unicode+str" weirdness causes request_headers + request_body to fail because request_body is already a binary(?) string.

Btw I don't think aws4auth should be doing an .encode('utf-8') -- it should already be "bytes", right? At least HTTPBasicAuth and S3Auth expect the client calling requests.put() to pass data already encoded to utf-8 bytes.

Finally, maybe this is still a bug in requests or python httplib.py? Should it allow unicode string headers, containing only ascii (or iso-8859-1?), and /usr/lib64/python2.7/httplib.py _send_output() should force msg to str before appending the request body?


Reproduction:

>>> import requests
>>> requests.__version__
'2.13.0'
>>> import requests_aws4auth
>>> requests_aws4auth.__version__
'0.9'
>>> AUTH=requests_aws4auth.AWS4Auth('testkey', 'secret', 'eu-west-1', 's3')
>>> requests.put('http://example.com/',headers={'Content-type':'text/plain; charset="UTF-8"'}, data=u'\u24B6\u24B7\u24B8\u24B9'.encode('utf-8'),auth=AUTH)

That should work, and it does when using requests.auth.HTTPBasicAuth or S3 V2 signature package awsauth.S3Auth. But requests-aws4auth gets exception:

>>> requests.put('http://example.com/',headers={'Content-type':'text/plain; charset="UTF-8"'}, data=u'\u24B6\u24B7\u24B8\u24B9'.encode('utf-8'),auth=AUTH)
!!!1 u'PUT / HTTP/1.1\r\nHost: example.com\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.13.0\r\nContent-type: text/plain; charset="UTF-8"\r\nContent-Length: 12\r\nx-amz-date: 20170215T040027Z\r\nx-amz-content-sha256: 7ec37a06579472c0743b58bd45af589cca817f65bbd8c6e528bc5e3092166396\r\nAuthorization: AWS4-HMAC-SHA256 Credential=john/20170215/eu-west-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=833120dd7cbe023d12c8bd24c6a746ba8ebcf8279346c0e58485e56c1a9ab5a5\r\n\r\n'
!!!2 '\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9'
!!!3 u'PUT / HTTP/1.1\r\nHost: example.com\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.13.0\r\nContent-type: text/plain; charset="UTF-8"\r\nContent-Length: 12\r\nx-amz-date: 20170215T040027Z\r\nx-amz-content-sha256: 7ec37a06579472c0743b58bd45af589cca817f65bbd8c6e528bc5e3092166396\r\nAuthorization: AWS4-HMAC-SHA256 Credential=john/20170215/eu-west-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=833120dd7cbe023d12c8bd24c6a746ba8ebcf8279346c0e58485e56c1a9ab5a5\r\n\r\n\u24b6\u24b7\u24b8\u24b9'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 124, in put
    return request('put', url, data=data, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib64/python2.7/httplib.py", line 1020, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 1054, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 1016, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 865, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

The "!!!" lines are debugging output I added to /usr/lib64/python2.7/httplib.py _send_output()

        if isinstance(message_body, str):
            print('!!!1 '+repr(msg))
            print('!!!2 '+repr(message_body))
            print('!!!3 '+repr(msg + message_body.decode('utf-8')))
            msg += message_body
reywood commented 7 years ago

Here's a workaround that doesn't involved modifying the requests-aws4auth source code. Use the following wrapper class in place of the AWS4Auth class. It encodes the headers created by AWS4Auth into byte strings thus avoiding the UnicodeDecodeError downstream.

from requests_aws4auth import AWS4Auth

class AWS4AuthEncodingFix(AWS4Auth):
    def __call__(self, request):
        request = super(AWS4AuthEncodingFix, self).__call__(request)

        for header_name in request.headers:
            self._encode_header_to_utf8(request, header_name)

        return request

    def _encode_header_to_utf8(self, request, header_name):
        value = request.headers[header_name]

        if isinstance(value, unicode):
            value = value.encode('utf-8')

        if isinstance(header_name, unicode):
            del request.headers[header_name]
            header_name = header_name.encode('utf-8')

        request.headers[header_name] = value
akuchling commented 7 years ago

I'm also seeing this bug with requests 2.18.4 (the latest as of today) and requests-aws4auth 0.9 on Python 2.7, when the body of the HTTP request isn't 7-bit-clean ASCII. It looks like requests doesn't expect header names to be Unicode, and at some point it ends up combining the Unicode headers with a UTF-8 encoded body, failing to decode the body with the default 'ascii' encoding.

Another fix would be to remove the from __future__ import unicode_literals declaration, but that's farther-reaching than just encoding the header keys and values.