QUESTION: Terraform plan output may contain sensitive data.

caquino commented 6 years ago

Hi,

We are evaluating adoption of Atlantis and one of our concerns regarding security is that while doing 'atlantis plan' secret data can be displayed on the plan output.

Is there any recommendation on how to overcome this issue?

Thanks

lkysow commented 6 years ago

Hi Cassiano, right now there's no built-in way to Atlantis to overcome that issue however there are some mechanisms within Terraform itself and some options I'd like to evaluate for atlantis:

Within Terraform

As per https://github.com/hashicorp/terraform/issues/16554#issuecomment-341849007, output secrets can be hidden by setting sensitive = true:

output "out" {
  value     = "$secret"
  sensitive = true
}

Resulting in an apply output of:

Outputs:

out = <sensitive>

Within resources, it's provider dependent to annotate arguments as secret. For example, the aws_db_instance resource marks its password argument as secret, so:

resource "aws_db_instance" "default" {
  # ...
  username = "foo"
  password = "foobarbaz"
}

Results in:

Terraform will perform the following actions:

  + aws_db_instance.default
     ...
      password:                   <sensitive>
      username:                   "foo"

Plan: 1 to add, 0 to change, 0 to destroy.

Just for my knowledge is there a particular resource you're managing that you're worried about?

For Atlantis

I'm going to try out https://github.com/opencredo/terrahelp which has a mask function and I'll get back to you if that works and if so how to use it within Atlantis.

caquino commented 6 years ago

@lkysow Thanks for such fast and comprehensive answer!

First of all please read this with the "brainstorm hat", I'm not demanding or telling what should be done.

We are aware that this is not really an Atlantis issue, its a terraform issue, but it turned out to be even more of a nuisance in our case because of having this content on GH comments.

Sometimes, data like URIs, IP addresses and such, that are not really "sensitive data" gets exposed which gives more information about our infrastructure than needed, I can imagine some possible options to solve the issue:

1 - To have options on atlantis.yml to specify regular expressions to filter out the plan output before submitting it to GH, which IMHO is not the best option because it may mask data that is useful for the user to determine if the plan should/should not be applied.

2 - In conjunction to 1, instead of masking the data as sensitive let the regular expression specify filters to encrypt parts of the plan output, which lets the user decrypt the data if it's something necessary for them to analyze the plan output.

Having an output similar to :

Terraform will perform the following actions:

  + aws_db_instance.default
     ...
      password:                   ENCRYPTED:Hd8ajksa89alashdhkasd==
      username:                   "foo"

Plan: 1 to add, 0 to change, 0 to destroy.

Which would allow users if necessary to decrypt the data if they have the key and even if they want to be fancy to build a browser extension to do it automatically.

Lets imagine that I could do something like: atlantis plan pgp_encrypt:(my pgp key id)

3 - For the plan an option to have the plan to be hosted on the Atlantis web interface itself, which I also don't like as it makes Atlantis a lot more complex as it will need to start having state to be able to fulfill compliance auditing reports.

Being completely honest, after writing the options down, none feels like a good option they all make me feel bad for different reasons, but most of all none feels like a good user experience. (I suspect this is what makes things more secure?)

BTW the idea of the encryption on the state output was inspired by https://github.com/mozilla/sops

lkysow commented 6 years ago

Yes security tends to do that :) but I think this is definitely something that Atlantis should aim to tackle since it's the one posting the plan output on github! I think building this solution is best tackled with a well-defined use-case that I can fully understand. If you'd be interested, I'd love to have a quick call with you to more fully understand your use-case and how you deal with secrets. You can email me at lkysow [at] gmail if you'd like to set that up or message me in the slack channel. I think it would be dangerous to build this in a vacuum without a real-world use case.

Also brainstorming hat on...

A fully authed and rich Atlantis UI is a direction I want to go but it's not a short-term solution. I could do something short-term where you can mark a plan output as sensitive, causing it to only be available on the atlantis server as just a hosted file and then rely on users to add their own auth system on top of the UI (see https://github.com/runatlantis/atlantis/issues/49#issuecomment-398265180). For this I have a couple questions:

is this a project-wide thing where you mark the plan output as sensitive or is it a per individual plan thing in which case you'd need to disable autoplanning and then manually plan with new flag to atlantis, ex. atlantis plan --sensitive
do people want to run atlantis behind their own auth proxy?

Without the UI, I think the solution is some form of:

A config that allows users to specify which variables or resource arguments are sensitive
A mechanism within Atlantis for detecting those being output from terraform plan and masking them
A mechanism for the Atlantis user to view those masked variables.

I think 1 and 2 are easier than 3. And in terms of order of operations, I might want to implement 1 and 2, say for 3 that you just need to run plan locally (for now) and then build out 3 later. I know codecov.io has a simple UI-based secret mechanism where you paste in your string and it gives you the encrypted version back, Atlantis could have the same thing but the opposite. Questions I have about that:

Is it variables or resource attributes that are sensitive or both? The config format would need to be capable of defining both and I'm not sure a regex would be flexible enough?
Is it obvious beforehand what is going to be sensitive? I'm wondering what the workflow for an engineer would be. Do they see the plan has something sensitive, delete the comment and add another line to the config file. Or are whole repos/directories known to be sensitive.

lkysow commented 6 years ago

I've got Atlantis working with terrahelp mask which masks sensitive variables from a plan. For example:

terraform plan | terrahelp mask

terrahelp figures out which variables are sensitive simply by assuming that all variables in a terraform.tfvars file (which Terraform includes automatically) are sensitive. Obviously if you put sensitive variables into that file and then checked it in, it would have no purpose so you would have to create that file at runtime, maybe by downloading from Vault?

Either way, maybe this helps. You can see the Atlantis config here: https://github.com/runatlantis/atlantis/issues/46#issuecomment-403841683 and you'll need the terrahelp binary to be available where Atlantis is running.

You can read more about terrahelp here: https://github.com/opencredo/terrahelp

lkysow commented 6 years ago

If #121 were implemented (store plan output on server) then this would also be a possible solution. I don't think you'd need to keep the plan output around for auditing reasons because it's not the plan that matters but the apply log. If the apply log could still be printed to the pull request (in which any sensitive outputs could be marked as such) then this would be another solution.

sterfield commented 5 years ago

Unfortunately, sensitive outputs is not covering all cases.

For example, if you use random_string to generate passwords, the ID of the random_string is... the content of the string generated. So in plan or apply, you would see random_string.database_password (ID:<the password>)...

And relying on terrahelp or string regexs is tricky, as a single change in output format in Terraform could break it and leak your secrets in Github comments.

IslamAzab commented 4 years ago

Is there a way do mark variables as sensitive so that they don't show up in the plan output?

For example

source

resource "aws_sns_topic_subscription" "subscription" {
  endpoint = "${var.url}${var.token}"

output


resource "aws_sns_topic_subscription" "subscription" {
  endpoint  = "https://www.url-example.com/path?access_token=SENSITIVE_DATA"

lkysow commented 4 years ago

Not built in to Atlantis

h3lo commented 4 years ago

Is there a way do mark variables as sensitive so that they don't show up in the plan output?

For example

source
resource "aws_sns_topic_subscription" "subscription" {
  endpoint = "${var.url}${var.token}"
output
resource "aws_sns_topic_subscription" "subscription" {
  endpoint  = "https://www.url-example.com/path?access_token=SENSITIVE_DATA"

I know this issue is old, but I am running into this exact situation (API key in aws_sns_topic_subscription resource) being exposed by TF. Could we perhaps have a way to mark certain attributes like this as sensitive so they get masked by atlantis during plan/apply?

I know the issue is in the provider, but as mentioned above, it is Atlantis that makes this an issue since it posts comments to git. Disabling auto-planning and requiring people to look at the Atlantis server kind of defeats the purpose of Atlantis imo...

IslamAzab commented 4 years ago

@h3lo I am not sure if it will be addressed in Atlantis.

You can try Terraform sensitive output

output "db_password" {
  value       = aws_db_instance.db.password
  description = "The password for logging in to the database."
  sensitive   = true
}

h3lo commented 4 years ago

That attribute is only for outputs. I'm talking about the aws_sns_topic_subscription resource, and more specifically, the endpoint attribute, which is likely to be sensitive for anyone using some 3rd party's API over HTTPS.

I even tried creating a module with nothing in it but my sensitive data, and outputting it with the sensitive attribute set, then using it in my aws_sns_topic_subscription resource's endpoint attribute by reference, but TF still outputs its interpolated value.

This isn't really a big risk when doing terraform from the command line, it's specifically a function of Atlantis making the plan and apply output semi-public that causes risk. Of course, this aspect of Atlantis is also one of the main reasons to use it (tracking of what's been done).

Atlantis generally works great, thank you for your hard work. I hope my feedback can help improvements be made that will be useful to everyone.

wlonkly commented 4 years ago

I think

TF still outputs its interpolated value

is the key bit here, and would encourage opening up an issue on the provider, since I think that's the right place to mark sensitivity rather than having special cases in various providers inside Atlantis. The same problem would happen in eg. Terraform Cloud, and can be avoided by setting the sensitive flag in the provider schema.

Heck, it's probably a one-line PR, so you could be a Terraform contributor! 😄

(FWIW, I agree that endpoint is sensitive for the reasons you described and would be happy to upvote the issue there.)

h3lo commented 4 years ago

@wlonkly I feel quite strongly that this doesn't belong in the provider. The provider outputting some sensitive data isn't inherently wrong because from TF perspective, the host you run plans and applies on is secure. It is Atlantis that makes this output public and therefore it is specifically Atlantis that should handle masking it.

jasonrberk commented 3 years ago

@h3lo I think it would be nice if the provider provided a way for you to "mark" that something is sensitive, or even denoted sensitive data in some standardized way. Then, Atlantis could use that standard accordingly, instead of a potentially non-deterministic regex. This would still allow TF to spit out the real info, that you want to see when you run locally, but also allow Atlantis to scrub it when posting it in a PR.

Another thought..... after the PR merges, could / should the Atlantis comments be deleted? Just getting started with TF and Atlantis, so I'm no expert....but wouldn't that at least limit exposure. I know our TF PRs don't generally stay open for very long. By the time a dev is ready to PR, they should have done the plan/apply and even a destroy in a lower env.

I don't have enough history to know if I'd ever need to look back on those plan / apply PRs later to debug something.

nitrocode commented 1 year ago

For example, if you use random_string to generate passwords, the ID of the random_string is... the content of the string generated. So in plan or apply, you would see random_string.database_password (ID:)...

@sterfield I know this is an old issue. I have a small addition here which may not have been available at the time.

https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/password

Identical to random_string with the exception that the result is treated as sensitive and, thus, not displayed in console output. Read more about sensitive data handling in the Terraform documentation.

GMartinez-Sisti commented 6 months ago

We're having a similar issue with this masking values on the plan output. Due to some architecture decisions, we have some variables that are considered secrets, created on terraform and passed into templates variables to templatefile functions. Since they are passed as text, terraform loses the context of them being sensitive and outputs them as clear text on the plan. Before having automation on a PR this was not a concern, as the terraform executor would also have access to the terraform state file, so it would be acceptable to have it on an output, but now the value is available to anyone with access to the repository where Atlantis comments.

I understand this might be considered a hack, since the real solution is to address the way the secret is passed, but as we all know, sometimes we don't have the cycles to fix this when it's already working across multiple production environments and the business has other requirements.

My proposal is to add a feature to allow masking the plan output based on a configurable regex. The regex argument can be a list to make it easier to achieve multiple masks without increasing the complexity of the regex pattern. I've tested this locally with a simple pipe that sends terraform plan output to sed and it works as long as we can figure out a valid regex for the desired outcome, I believe this would be somewhat easy to achieve with Atlantis. Afaik Atlantis doesn't really interfere with terraform, just calls the binary and interprets results so I can't think of another way to achieve this.

I'm happy to develop this feature. Thoughts?

PS: Another option is to allow a custom post processing command on https://github.com/runatlantis/atlantis/blob/57aef30416f3e6e10d8c87222952f92deef61f3b/server/core/config/valid/repo_cfg.go#L176-L183

runatlantis / atlantis

QUESTION: Terraform plan output may contain sensitive data. #163

Within Terraform

For Atlantis