Open ghost opened 2 years ago
Okay so I've been diving into this one a little more and found part of a root cause.
As skew iterates through the required calls to find each bucket, if it encounters a region which the user has not enabled (such as eu-south-1 which comes disabled by default) then it begins to endlessly loop. This is probably a failure mode that should be caught and handled.
Log sample. Once it hits the regions which are not enabled by default, this log peice will loop endlessly:
DEBUG:botocore.hooks:Event needs-retry.s3.ListBuckets: calling handler <botocore.retryhandler.RetryHandler object at 0x7fedbc4545e0>
DEBUG:botocore.retryhandler:No retry needed.
DEBUG:botocore.hooks:Event needs-retry.s3.ListBuckets: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7fedbc454040>>
DEBUG:botocore.hooks:Event before-parameter-build.s3.ListBuckets: calling handler <function validate_bucket_name at 0x7fedc124d4c0>
DEBUG:botocore.hooks:Event before-parameter-build.s3.ListBuckets: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7fedbc454040>>
DEBUG:botocore.hooks:Event before-parameter-build.s3.ListBuckets: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7fedbc454130>>
DEBUG:botocore.hooks:Event before-parameter-build.s3.ListBuckets: calling handler <function generate_idempotent_uuid at 0x7fedc124d310>
DEBUG:botocore.hooks:Event before-call.s3.ListBuckets: calling handler <function add_expect_header at 0x7fedc124d820>
DEBUG:botocore.hooks:Event before-call.s3.ListBuckets: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7fedbc454040>>
DEBUG:botocore.hooks:Event before-call.s3.ListBuckets: calling handler <function inject_api_version_header_if_needed at 0x7fedc124bb80>
DEBUG:botocore.endpoint:Making request for OperationModel(name=ListBuckets) with params: {'url_path': '/', 'query_string': '', 'method': 'GET', 'headers': {'User-Agent': 'Boto3/1.18.51 Python/3.8.10 Linux/4.19.128-microsoft-standard Botocore/1.21.54'}, 'body': b'', 'url': 'https://s3.ap-east-1.amazonaws.com/', 'context': {'client_region': 'ap-east-1', 'client_config': <botocore.config.Config object at 0x7fedbf1ee580>, 'has_streaming_input': False, 'auth_type': None, 'signing': {'bucket': None}}}
DEBUG:botocore.hooks:Event request-created.s3.ListBuckets: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7fedbf1ee160>>
DEBUG:botocore.hooks:Event choose-signer.s3.ListBuckets: calling handler <bound method S3EndpointSetter.set_signer of <botocore.utils.S3EndpointSetter object at 0x7fedbc454370>>
DEBUG:botocore.hooks:Event choose-signer.s3.ListBuckets: calling handler <function set_operation_specific_signer at 0x7fedc124d1f0>
DEBUG:botocore.hooks:Event before-sign.s3.ListBuckets: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7fedbc454370>>
DEBUG:botocore.auth:Calculating signature using v4 auth.
DEBUG:botocore.auth:CanonicalRequest:
GET
/
host:s3.ap-east-1.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20211021T013437Z
host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
DEBUG:botocore.auth:StringToSign:
AWS4-HMAC-SHA256
20211021T013437Z
20211021/ap-east-1/s3/aws4_request
7bd4e093a22164ff1ce64436fab33b99c85fe70d05dd806716821d481178b3f3
DEBUG:botocore.auth:Signature:
88c3fe15890b7be66d756b72cfe21e6f93d286e786aa4323e92ac94ca0318d25
DEBUG:botocore.endpoint:Sending http request: <AWSPreparedRequest stream_output=False, method=GET, url=https://s3.ap-east-1.amazonaws.com/, headers={'User-Agent': b'Boto3/1.18.51 Python/3.8.10 Linux/4.19.128-microsoft-standard Botocore/1.21.54', 'X-Amz-Date': b'20211021T013437Z', 'X-Amz-Content-SHA256': b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': b'AWS4-HMAC-SHA256 Credential=AKIAZTZ53RGMUZAGRQVN/20211021/ap-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=88c3fe15890b7be66d756b72cfe21e6f93d286e786aa4323e92ac94ca0318d25'}>
DEBUG:botocore.httpsession:Certificate path: /etc/ssl/certs/ca-certificates.crt
DEBUG:urllib3.connectionpool:https://s3.ap-east-1.amazonaws.com:443 "GET / HTTP/1.1" 403 None
DEBUG:botocore.parsers:Response headers: {'x-amz-request-id': '4CAFB390YR4VHYSA', 'x-amz-id-2': 'IvRMi0g1/8tKgcbl2NN2s9LNKJZ3rTpckhtsa9QQUOuZFiXo2wk4MHx9f2SEsE0G3URzSfbLWvY=', 'Content-Type': 'application/xml', 'Transfer-Encoding': 'chunked', 'Date': 'Thu, 21 Oct 2021 01:34:37 GMT', 'Server': 'AmazonS3'}
DEBUG:botocore.parsers:Response body:
b'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>AKIAZTZ53RGMUZAGRQVN</AWSAccessKeyId><RequestId>4CAFB390YR4VHYSA</RequestId><HostId>IvRMi0g1/8tKgcbl2NN2s9LNKJZ3rTpckhtsa9QQUOuZFiXo2wk4MHx9f2SEsE0G3URzSfbLWvY=</HostId></Error>'
DEBUG:botocore.hooks:Event needs-retry.s3.ListBuckets: calling handler <botocore.retryhandler.RetryHandler object at 0x7fedbc4545e0>
DEBUG:botocore.retryhandler:No retry needed.
DEBUG:botocore.hooks:Event needs-retry.s3.ListBuckets: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7fedbc454040>>
DEBUG:botocore.hooks:Event before-parameter-build.s3.ListBuckets: calling handler <function validate_bucket_name at 0x7fedc124d4c0>
DEBUG:botocore.hooks:Event before-parameter-build.s3.ListBuckets: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7fedbc454040>>
DEBUG:botocore.hooks:Event before-parameter-build.s3.ListBuckets: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7fedbc454130>>
For those that need it, you can monkeypatch out the bad regions with the following snippet:
from skew.arn import Region
# the keys we use don't propagate to all regions which breaks boto3, so we need to patch which regions we look up
bad_regions = ['eu-south-1', 'ap-east-1', 'af-south-1', 'me-south-1', 'cn-north-1', 'cn-northwest-1']
Region._all_region_names = filter(lambda v: v not in bad_regions, Region._all_region_names)
Summary
So I've been playing with skew for a week or so now and whilst it principally works for what I want it to do at a functional level, I've found that something is causing it to handle blocking IO very poorly (not at all in fact)
The tests I've been running are to scan my own AWS account for S3 buckets. Skew works perfectly for finding the 3 buckets I expected it to, but then never returns.
At first I thought I was being impatient, so I decided to leave it run overnight and through the morning. The function still never returned. I understand that scanning all of AWS is a potentially non-trivial task, but more than 16 hours? Something must be up.
Investigation
The first thing I noticed trying to work out what was going on here was (when it worked) the dump from the keyboard interrupt when running in the console:
Diving into the codebase skew wraps around boto3 which in turn uses (to no massive surprise)
urllib3
to manage its requests to AWS itself.So cool, it's just
urllib3
under the hood and it's timing out. Wellurllib3
doesn't have a default timeout for requests, but you can certainly set one. So usingsocket.setdefaulttimeout(60)
set the timeout to 60 seconds (pretty fair imo) and suddenly skew (or more specifically boto3) doesn't work at all. Even the original buckets that I expect to see listed there are not returned. What makes this slightly more confusing is that it doesn't actually return a timeout exception; I'm not even convinced the requests are made.With messing with the timeout not working I thought, what if I can just force the thread to exit by force? So using various forms of inspiration from this thread, I tried forcing skew to exit early.
Context Manager
This context manager didn't work at all, skew just didn't stop
Thread Decorator
This thread decorator saw better results in that the
quit_function
call is actually made (the call to log.info is made) but again, skew doesn't exit gracefully and just hangs.I'm honestly running out of ideas as to where to go looking for this one. My use case is actually to be able to run skew on a serverless function endpoint that I can just call, but these have well defined timeouts that I just can't seem to get skew to obey.
Any ideas?