tedder / requests-aws4auth

Amazon Web Services version 4 authentication for the Python Requests module
MIT License
178 stars 62 forks source link

Function amz_cano_querystring() does not work correctly. #45

Open napat1412 opened 3 years ago

napat1412 commented 3 years ago

I found problem about oder of canonicail querystring. It is not sorted correctly

>>> from requests_aws4auth import AWS4Auth
>>> qs = 'key=&key-type=s3&format=json'
>>> AWS4Auth.amz_cano_querystring(qs)
'format=json&key-type=s3&key='

Expected behavior

>>> AWS4Auth.amz_cano_querystring(qs)
'format=json&key=&key-type=s3'

Querystring keys consist of format, key-type and key. After sorted, querystring keys should be format, key, key-type. (key must sorted before key-type). I think you need to rewrite the sort method.

willieconrad commented 2 years ago

If it helps, I resolved this issue by using the following extension. I can probably make a PR with these changes some time soon ... I tested this pretty thoroughly with all kinds of crazy query string characters, but the more eyes on it the better.


class FixedAWS4Auth(AWS4Auth):
    """The current implementation of AWS4Auth is badly broken.

    When constructing the canonical query string, the first thing it
    does is decode the entire URL.  This is wrong right out of the gate.
    Consider, for example, that someone needs to send a query parameter
    that contains a value of '=' or '&'.  The properly encoded query string
    will come in to amz_cano_querystring as

        a=%26%3D

    However, as soon as it is decoded, we'll get a=&=, which will completely
    mess up the parsing and give 'a' a blank value.  The correct solution is to
    use parse_qs, which properly parses and decodes the query string.

    There are also some issues with how '+' is interpreted.  Sigv4 stipulates
    that spaces must always be encoded as %20, and that a '+' must be encoded
    as %2B, so the canonical query string should never end up with '+'
    character.  The current implementation is not guaranteeing this, as
    it is using urllib.parse.quote, which encodes spaces as '+' by default.
    """

    @staticmethod
    def amz_cano_querystring(qs):
        """
        Parse and format querystring as per AWS4 auth requirements.

        Perform percent quoting as needed.

        qs -- querystring

        """
        # If Python 2, switch to working entirely in str
        # as urlencode() has problems with Unicode
        if PY2:
            qs = qs.encode('utf-8')

        # URL is already encoded at this point, parse to get individual params.
        # This will interpret '+' as a space, which is not configurable in Python 2.
        # This is fine, since requests does the same thing.

        # We must supply keep_blank_values as true here, or else they will be
        # omitted entirely. Sigv4 expects blank values to appear in the canonical
        # query string as 'var='.
        params = parse_qs(qs, keep_blank_values=True)
        params = sorted(list(params.items()), key=lambda item: item[0])

        # Multi values are handled by doseq, but they need to be lex sorted within themselves.
        # This is not explicit in the sigv4 doc, but determined via experimentation.
        # doseq will preserve the order we set here.
        params = [(k, list(sorted(v))) for k, v in params]
        qs = urlencode(params, doseq=True)

        # Sigv4 requires we never see '+' in the query string.  Spaces must
        # be %20, and an actual '+' must be encoded as '%2B'.  urlencode can be
        # configured to quote without using '+', but only in Python 3.  This makes
        # it a bit more explicit and loosens the dependency on Python version.
        qs = qs.replace('+', '%20')

        if PY2:
            qs = unicode(qs)
        return qs
tedder commented 2 years ago

feel free to PR it 👍 , it's okay to drop py2 support too.