AWS Rate Limiting apparently doesn't limit rates

rbailey-godaddy commented 2 years ago

Describe the bug

I was just reviewing a run log for one of our nontrivial AWS accounts, and got pages and pages of this:

  | 2022-01-25T07:11:16.883-05:00 | 2022-01-25 12:11:16 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:17.312-05:00 | 2022-01-25 12:11:17 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:20.228-05:00 | 2022-01-25 12:11:20 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:21.765-05:00 | 2022-01-25 12:11:21 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:22.258-05:00 | 2022-01-25 12:11:22 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:25.533-05:00 | 2022-01-25 12:11:25 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:26.318-05:00 | 2022-01-25 12:11:26 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:27.151-05:00 | 2022-01-25 12:11:27 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:27.363-05:00 | 2022-01-25 12:11:27 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:27.879-05:00 | 2022-01-25 12:11:27 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:28.928-05:00 | 2022-01-25 12:11:28 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:29.710-05:00 | 2022-01-25 12:11:29 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:32.016-05:00 | 2022-01-25 12:11:32 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s
  | 2022-01-25T07:11:32.421-05:00 | 2022-01-25 12:11:32 ip-10-126-53-191.ec2.internal scout[25] INFO Hitting API rate limiting, will retry in 15s

For context, here is what we're probing:

  | 2022-01-25T07:02:01.315-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the CloudFormation service
  | 2022-01-25T07:02:01.350-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the CloudTrail service
  | 2022-01-25T07:02:01.390-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the Config service
  | 2022-01-25T07:02:01.431-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the EC2 service
  | 2022-01-25T07:02:01.473-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the ELB service
  | 2022-01-25T07:02:01.510-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the ELBv2 service
  | 2022-01-25T07:02:01.565-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the IAM service
  | 2022-01-25T07:02:01.566-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the RDS service
  | 2022-01-25T07:02:01.597-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the RedShift service
  | 2022-01-25T07:02:01.634-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the S3 service
  | 2022-01-25T07:02:01.721-05:00 | 2022-01-25 12:02:01 ip-10-126-53-191.ec2.internal scout[25] INFO Fetching resources for the VPC service

Even if I presume that each of those resources is scanned in parallel and independently hitting its own rate limit, the frequency of the announcements suggests whatever actual limit is being applied is far less than the claimed 15 seconds. (Or, many many pages of output later, 30 seconds, and then 45 seconds...)

The scan eventually succeeds (at timestamp 2022-01-25T07:48:42.797-05:00) but this feels both abusive and inefficient.

To Reproduce

This appears to be an intermittent condition. FWIW, the command is:

/usr/local/bin/scout aws  $SS_OPTS  \
    --access-keys \
    --access-key-id $SS_AWS_ACCESS_KEY_ID \
    --secret-access-key $SS_AWS_SECRET_ACCESS_KEY \
    --no-browser --ruleset godaddy.json

Where SS_OPTS is --services cloudformation cloudtrail config ec2 elb elbv2 iam rds redshift s3 vpc and the other variables are obvious. godaddy.json is a tweaked version of defaults.json that turns on some checks and turns off some others.

This is being produced by ScoutSuite version 5.10.2.

x4v13r64 commented 2 years ago

Even if I presume that each of those resources is scanned in parallel and independently hitting its own rate limit, the frequency of the announcements suggests whatever actual limit is being applied is far less than the claimed 15 seconds. (Or, many many pages of output later, 30 seconds, and then 45 seconds...)

The 15s is per API call, so if there are many services hitting the API rate limit, you'll indeed get a significant amount of messages. This is mostly due to AWS' horrible rate limiting implementation (per service, with a different quota for services and endpoints) and SS' architecture not fitting the per-service model.

this feels both abusive and inefficient.

100%, but AWS have been quite unresponsive (e.g. https://github.com/nccgroup/ScoutSuite/issues/91 & https://github.com/boto/boto3/pull/2086) to proposed strategies/remediation, and SS' architecture would require significant rewrites to handle these more gracefully. If they bothered implementing progressive backoff in boto3, we wouldn't have to worry about it.

danielnbalasoiu commented 2 years ago

2022-06-29 12:29:48 09e76605204d scout[154] INFO Fetching resources for the Secrets Manager service
2022-06-29 12:31:48 09e76605204d scout[154] INFO Hitting API rate limiting (facade/ec2.py L152), will retry in 15s
2022-06-29 12:32:06 09e76605204d scout[154] INFO Hitting API rate limiting (facade/ec2.py L152), will retry in 15s
2022-06-29 12:32:24 09e76605204d scout[154] INFO Hitting API rate limiting (facade/ec2.py L152), will retry in 15s
2022-06-29 12:32:25 09e76605204d scout[154] INFO Hitting API rate limiting (facade/ec2.py L152), will retry in 15s
2022-06-29 12:32:26 09e76605204d scout[154] INFO Hitting API rate limiting (facade/ec2.py L152), will retry in 15s

Are there any workarounds?

rbailey-godaddy commented 2 years ago

Are there any workarounds?

This is not a complete workaround but has a beneficial effect in nearly all of our environments:

# Enable API backoff and throttling
# "adaptive" mode is labeled "experimental and subject to change", but in testing
# it results in significantly better behavior than either of the other modes.
# Reducing the number of workers (with limited testing) seemed to result in
# either no or negative impact (*MORE* API warnings).
export AWS_RETRY_MODE=adaptive  # or "legacy" (default) or "standard"

Do this before running scout; the underlying boto3 will pick up the environment variable and adapt accordingly. If you RTFM, Amazon claims "standard" is the default behavior, but that appears to be for awscli and not for API calls (i.e. via boto3).

danielnbalasoiu commented 2 years ago

That was a quick reply 👍

I run export AWS_RETRY_MODE=adaptive before executing scout aws for scanning three different environments and I got only one error and it passed on the second retry.

Thank you!

nccgroup / ScoutSuite

AWS Rate Limiting apparently doesn't limit rates #1396