okigan / awscurl

curl-like access to AWS resources with AWS Signature Version 4 request signing.
MIT License
737 stars 91 forks source link

PR #77 returns raw bytes as output, resulting output that is not parsable as json in current(0.19) release when using python 3 #85

Closed p5k6 closed 3 years ago

p5k6 commented 4 years ago

When using 0.19 awscurl with python 3, encoding response.text results in a byte object being returned. Printing that directly results in the output being prefixed with b' (see screenshots below), and thus json output cannot be parsed directly from the response. In Python 2 this is not an issue. I could also see this being an issue for (say) an xml based response as well (though I have not tested that).

My use case: I'm currently invoking awscurl in a unix pipe, where the result of awscurl is piped to jq for further processing (in CI); updating to the latest version broke the pipe. For the moment I've pinned the version in CI at 0.17

Python 3 screenshot

Screen Shot 2019-12-18 at 4 50 09 PM

Python 2 screenshot

Screen Shot 2019-12-18 at 4 50 47 PM

okigan commented 4 years ago

hat 'b' would be annoying - thanks for identifying the location of the problem

okigan commented 4 years ago

@p5k6 I this topic requires more info, horror stories about mercurial's journey concerns me (https://news.ycombinator.com/item?id=22037905) and I am not versed enough on pythons handling of this -- so need some help on what is more holistic approach (vs squashing the b"")

marklap commented 4 years ago

IMO, this change should be reverted. I can't speak to what is/was really going on with the contributor, but it seems that if the Content-Encoding header in the response was set correctly, that change would not be necessary because the requests.Response.text method is already trying to do the right thing. It states specifically in the docstring of the requests.Response.text method, that implementers should set the encoding value to something specific if the HTTP Content-Encoding header is not determining the correct encoding.

Also, this change is a little odd in general because you're taking a string that was converted from bytes (by requests.Response.text) and then converting it to bytes to send to the print function that converts everything to a string. This accidentally works in Python 2 (read up on Python 2/3 bytes and string differences) but breaks Python 3. The result is that this change breaks Python 3 to support Python 2 which is likely not intended.

FWIW, I don't think you have the same problems as they had with Mercurial. You're simply trying to print a string. Mercurial was trying to solve the problem for myriad interfaces. I think you can safely revert the change; ask the contributor for more information and if the Content-Encoding header is set properly. Also, an additional argument to awscurl that could be used to explicitly set the response encoding.

okigan commented 4 years ago

@p5k6 reading @marklap comment and reviewing your snapshot -- could it be the issue is with your content encoding that returned from your api gateway?

Also i've setup a small experiment here: https://repl.it/@okigan/py-string-test

@p5k6 please review/advise or we'll start moving to revert this

marklap commented 4 years ago

Just to clarify, @p5k6 identified the bug and @muziyoshiz is the contributor of the MR that is likely the root cause.

cyrfer commented 4 years ago

I bumped into this today. It would be great if the leading 'b' was removed from the output for curl compatibility and ease of use.

I can open a different ticket if needed, but is there a way to tell the version of awscurl installed? Today I ran brew install awscurl and awscurl -v:

$ awscurl -v
usage: awscurl [-h] [-v] [-i] [-X REQUEST] [-d DATA] [-H HEADER] [-k] [--data-binary] [--region REGION] [--profile PROFILE] [--service SERVICE]
               [--access_key ACCESS_KEY] [--secret_key SECRET_KEY] [--security_token SECURITY_TOKEN] [--session_token SESSION_TOKEN]
               uri
awscurl: error: the following arguments are required: uri

While I'm at it, my AWS CLI is broken after installing awscurl with brew. Kind of a bummer. I think it required a different version of python than I had on my system. Noted over here https://github.com/okigan/awscurl/issues/87

okigan commented 4 years ago

@cyrfer did you run into that while calling a labmda/api gateway? what is the content type your endpoint is returning?

cyrfer commented 4 years ago

@okigan I was calling AppSync (like API GW, but for GraphQL).

In Postman I sign requests with service = appsync

Content-Type: application/json;charset=UTF-8
marklap commented 4 years ago

FYI - If you're following some other troubleshooting path then ignore this. If not, I'll offer this correction: it's Content-Encoding that changes the way the response body is handled with respect to this issue. Of course, Content-Type may also but not for the purposes of this discussion.

mathewpeterson commented 4 years ago

I am also running into this issue. Any plans on resolving this?

okigan commented 4 years ago

Hi Mathew, could you capture more details how do you reproduce this (which services, and relevant configuration)

On May 4, 2020, at 2:34 PM, Mathew Peterson notifications@github.com wrote:

 I am also running into this issue. Any plans on resolving this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

mathewpeterson commented 4 years ago

Sure,

We have a pretty straight forward setup: Setup using Serverless Framework using API Gateway to invoke a Lambda that has aws_iam authentication set up. The lambda returns a json document.

okigan commented 4 years ago

Could you capture the what is the content type of the doc (you can see it logs when incoming lambda in aws console)

On Tue, May 5, 2020 at 6:03 AM Mathew Peterson notifications@github.com wrote:

Sure,

We have a pretty straight forward setup: Setup using Serverless Framework using API Gateway to invoke a Lambda that has aws_iam authentication set up. The lambda returns a json document.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/okigan/awscurl/issues/85#issuecomment-624041799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADUYXX5CDRFOP2AM7DWRPTRQAFBBANCNFSM4J5I7X7Q .

AngusDavis commented 3 years ago

In case it helps, my repro:

  1. Running on Debian testing w/ python2 and 3 installed:
    $ python3 --version
    Python 3.8.5
  2. Installing awscurl:
    $ pip3 install --user awscurl
    WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
    Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
    To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
    Collecting awscurl
    Downloading awscurl-0.20-py3-none-any.whl (8.3 kB)
    Requirement already satisfied: urllib3[secure] in /usr/lib/python3/dist-packages (from awscurl) (1.25.9)
    Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from awscurl) (2.23.0)
    Collecting configparser
    Downloading configparser-5.0.0-py3-none-any.whl (22 kB) 
    Collecting configargparse
    Downloading ConfigArgParse-1.2.3.tar.gz (42 kB)
     |████████████████████████████████| 42 kB 765 kB/s 
    Requirement already satisfied: certifi in /usr/lib/python3/dist-packages (from urllib3[secure]->awscurl) (2020.4.5.1)
    Requirement already satisfied: cryptography>=1.3.4 in /usr/lib/python3/dist-packages (from urllib3[secure]->awscurl) (3.0)
    Requirement already satisfied: idna>=2.0.0 in /usr/lib/python3/dist-packages (from urllib3[secure]->awscurl) (2.10) 
    Requirement already satisfied: pyOpenSSL>=0.14 in /usr/lib/python3/dist-packages (from urllib3[secure]->awscurl) (19.1.0)
    Building wheels for collected packages: configargparse
    Building wheel for configargparse (setup.py) ... done
    Created wheel for configargparse: filename=ConfigArgParse-1.2.3-py3-none-any.whl size=19326 sha256=f2b555b0615b925af1e2f599a0919d73fe791fe88e77fa092a2fd6ca65d8085f
    Stored in directory: /home/[REDACTED]/.cache/pip/wheels/b4/12/fe/1e4628f6ec22a9580d9e8fbef23df7a03948a1edde0821d7f5
    Successfully built configargparse
    Installing collected packages: configparser, configargparse, awscurl
    Successfully installed awscurl-0.20 configargparse-1.2.3 configparser-5.0.0

I'm then attempting to make a request to ElasticSearch using signed requests:

$ awscurl --service es --region us-east-2 -v https://[REDACTED].us-east-2.es.amazonaws.com                                                                          
{'access_key': None,                                                                                                                                                                                                                  
 'data': '',                                                                                                                                                                                                                          
 'data_binary': False,                                                                                                                                                                                                                
 'header': None,                                                                                                                                                                                                                      
 'include': False,                                                                                                                                                                                                                    
 'insecure': True,                                                                                                                                                                                                                    
 'profile': 'default',                                                                                                                                                                                                                
 'region': 'us-east-2',                                                                                                                                                                                                               
 'request': 'GET',                                                                                                                                                                                                                    
 'secret_key': None,                                                                                                                                                                                                                  
 'security_token': None,                                                                                                                                                                                                              
 'service': 'es',                                                                                                                                                                                                                     
 'session_token': None,                                                                                                                                                                                                               
 'uri': 'https://[REDACTED].us-east-2.es.amazonaws.com',                                                                                                                                               
 'verbose': True}                                                                                                                                                                                                                     
"Credentials file '/home/[REDACTED]/.aws/credentials' exists 'True'"                                                                                                                                                                      
''                                                                                                                                                                                                                                    
('\n'
 'CANONICAL REQUEST = GET\n'
 '/\n'
 '\n'
 'host:[REDACTED].us-east-2.es.amazonaws.com\n'
 'x-amz-date:20200902T143515Z\n'
 '\n'
 'host;x-amz-date\n'
 '[REDACTED]')
('\n'
 'STRING_TO_SIGN = AWS4-HMAC-SHA256\n'
 '20200902T143515Z\n'
 '20200902/us-east-2/es/aws4_request\n'
 '[REDACTED]')
'\nHEADERS++++++++++++++++++++++++++++++++++++'
{'Accept': 'application/xml',
 'Authorization': 'AWS4-HMAC-SHA256 '
                  'Credential=[REDACTED]/us-east-2/es/aws4_request, '
                  'SignedHeaders=host;x-amz-date, '
                  'Signature=[REDACTED]',
 'Content-Type': 'application/json',
 'x-amz-content-sha256': '[REDACTED]',
 'x-amz-date': '20200902T143515Z',
 'x-amz-security-token': None}
'\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++'
('Request URL = '
 'https://[REDACTED].us-east-2.es.amazonaws.com')
'\nRESPONSE++++++++++++++++++++++++++++++++++++'
'Response code: 200\n'
{'Date': 'Wed, 02 Sep 2020 14:35:15 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Vary': 'Acce
pt-Encoding, User-Agent'}

b'{\n  "name" : "78e111028e3351ec006b1bd84d38c02b",\n  "cluster_name" : "[REDACTED]]",\n  "cluster_uuid" : "icdoqUEBQAiDYiXhzpvWXw",\n  "version" : {\n    "number" : "7.7.0",\n    "build_flavor" : "oss",\n    "build_ty
pe" : "tar",\n    "build_hash" : "unknown",\n    "build_date" : "2020-07-06T09:41:09.176473Z",\n    "build_snapshot" : false,\n    "lucene_version" : "8.5.1",\n    "minimum_wire_compatibility_version" : "6.8.0",\n    "minimum_inde
x_compatibility_version" : "6.0.0-beta1"\n  },\n  "tagline" : "You Know, for Search"\n}\n'

As you can see, a charset is specified in the Content-Type header. I tend to agree with the above, taking the string output of response.text, converting it to a bytes object (.encode()) and then printing the bytes object seems to be suspect.

While I can see the output of the request, I was hoping to use jq to pretty print the output of awscurl.

curtis-downing commented 3 years ago
awscurl --service es https://XXXXXXXX.us-east-1.es.amazonaws.com/_cluster/health | cut -d\' -f2 | jq
p5k6 commented 3 years ago

FYI - I've added a PR (#92) to address this issue, including a unit test which addresses this issue.

Note that - the response should have the info needed re: the encoding via the api request (response.encoding should be set per the data included in the headers in the response object). By adding .encode(), awscurl forces us to have a bytes output, instead of relying upon how python would natively interpret the encoding provided by the response encoding.

I believe that my PR includes the correct way to handle this issue. The issue raised in #77 was specific to python 2, which is no longer maintained (in fact, I couldn't get a development environment using python 2 to spin up).

p5k6 commented 3 years ago

Fixed in version 0.21 - thank you @okigan!