nccgroup / PMapper

A tool for quickly evaluating IAM permissions in AWS.
GNU Affero General Public License v3.0
1.41k stars 169 forks source link

graph --create command hanging #55

Closed beacomni closed 2 years ago

beacomni commented 4 years ago

graph --create churns along for about one hour, continuing to print progress output. Then hangs. The last line of the hanging output is

Found new edge: role/aws-service-role/robomaker.amazonaws.com/AWSServiceRoleForRoboMaker can use Lambda to create a new function with arbitrary code, then pass and access role/service-role/[one of our roles]

Also, when trying to run pmapper repl, Did not find file at: /Users/beacomni/Library/Application Support/com.nccgroup.principalmapper/947682355454

There are many lines of "Found new edge". Is there some upper bound on edges that pmapper can handle? Is there a way to get log output that might suggest why it is hanging?

Thanks.

ncc-erik-steringer commented 4 years ago

Hi @beacomni ,

A couple questions to start:

I suspect, based on the last line of output, that it's taking a while to chew through the Lambda functions in the account. I have a few suggestions in the meantime:

ncc-erik-steringer commented 3 years ago

Hi @beacomni : wanted to check in. Do you have any follow-up questions? Were the previous suggestions helpful?

Clete2 commented 3 years ago

I'm having the same issue, that it hangs on generating edges for Lambda:

2021-08-24 11:37:36+0000 | Found new edge: role/<snip> can update the trust document to access role/<snip>

2021-08-24 11:37:36+0000 | Pulling data on Lambda functions
2021-08-24 11:37:51+0000 | Generating Edges based on Lambda data.

It's been on that line since 8/24. The date is now 8/30. We have a big account with 1780 Lambdas currently. On a smaller account this step took a while but we extrapolated that analyzing this many Lambdas should have only taken a few hours. Yet, a single CPU has been pegged for 6 days.

We are on version 1.1.3 from pip. Adding debug doesn't give us any additional information.

I am running pmapper without Lambda now, but that removes a large benefit of pmapper. @ncc-erik-steringer - Do you have any advice?

ncc-erik-steringer commented 3 years ago

Reopening this issue in light of @Clete2 's comment.

Current code in master branch: https://github.com/nccgroup/PMapper/blob/722efec4f7c89a5a440facd8a5bd055c88db7194/principalmapper/graphing/lambda_edges.py#L72

Current algorithm is:

The time-consuming stuff is probably the simulations. The number of those we currently do (worst-case) is:

LR = number of roles that Lambda can assume
LF = number of Lambda functions
N = number of IAM Users and Roles in the account

LR * (2N + (2N * LF))

Meaning an account with 100 users and roles, 8 of which are assumable roles for Lambda, and 1000 Lambda functions will result in ~1.6M simulation calls. If our simulator can process 2 simulations/sec (need to benchmark to see what this actually looks like, I could see the ReadOnlyAccess policy being slow to process for PMapper) then that'd be 9.26 days to finish.

A couple things I'll look at:

Clete2 commented 3 years ago

@ncc-erik-steringer thanks for the comment. I'll adapt your math to my account.

LR * (2N + (2N * LF))
1780 * (2 * 2872 + (2 * 2872 * 1780)) <-- I assume the # of assumable Lambda roles ~= the number of Lambdas. We generally develop one role per Lambda
1780 * (5744 + (5744 * 1780))
1780 * (5744 + 10224320)
1780 * 10230064
18,209,513,920

18,209,513,920 / 2 simulations per second = 9,104,756,960 seconds
9,104,756,960 / 60 = 151,745,949.33333334 minutes
151,745,949.33333334 / 60 = 2,529,099.1555555556 hours
2,529,099.1555555556 / 24 = 105,379.1314814815 days
105,379.1314814815 / 365.2425 = 288.5182624735 years

Yes, I know our account is way too big.

One thing I am worried about is that my access token expires every hour, and I am worried once it's done calculating Lambda edges, and we're all 6 feet under, it'll move on to the next step and try to call AWS APIs and fail due to an expired token.

ncc-erik-steringer commented 3 years ago

Just pushed 725c05d6aa331c880338b910ad5ed64df29092f2 to the v1.1.4-dev branch with some of the work to help cut down on Lambda authorization simulation calls. I suspect it isn't going to help your case out very much but it's worth a try in the meantime.

Looking at the caching options, it turns out it'll take some breaking changes to add some LRU caches like I wanted. Maybe for v2.0.0 some day.

For IAM Roles, you may be able to configure a profile with the AWS CLI using credential_process (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sourcing-external.html) that might refresh? Otherwise you may need to write some code and use the botocore library's RefreshableCredentials class https://github.com/boto/botocore/blob/4927d3e3baa6a00226a1f017b638807beeb613a0/botocore/credentials.py#L368 .

Clete2 commented 3 years ago

Thanks for the new code. I have pulled it down and started running it on a c5.large instance. I'll leave this running for a week or so and let you know if it finishes or if I kill it.

For passwords I broke down and created a read-only user with static creds.

Clete2 commented 3 years ago

@ncc-erik-steringer it finished! On Sept 17th.

Since 9/17 I have been running the visualize command and it's still chugging.

ncc-erik-steringer commented 3 years ago

So roughly 16 days to complete? That's actually faster than I anticipated even with the changes. Thank you for being able to put PMapper through the stress test.

For visualization, PMapper basically sends the data to graphviz and waits for it to do the job. However, you may be able to output to a different filetype (.graphml) and use another renderer that works faster (Gephi or Cytoscape). That's controlled with the --filetype param.

Clete2 commented 3 years ago

Thanks for the tip. I just started the following command:

pmapper --account <myacct> visualize --filetype graphml --only-privesc

The png command is still running since the 17th and I did not kill it. Will update once I make some progress.

Clete2 commented 3 years ago

All done!

The png command failed because dot was not on the path.

The graphml one worked and I'm playing around with it in Gephi. Thanks very much for your support.

Yashvendra commented 1 year ago

Hey @ncc-erik-steringer,

  • Skipping edit/reconfigure edges if the caller can do the "create a function and pass the role" approach

I'm having trouble understanding why have we skipped this edge creation which can indicate another possible privilege escalation path. I get we are trying to reduce time complexity but skipping checks will reduce the detection of the attack surface, don't you think? Or am I missing something here, will you please care to explain?