hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
42.76k stars 9.56k forks source link

Variable with sensitive fields makes the full object sensitive #28222

Open horjulf opened 3 years ago

horjulf commented 3 years ago

Hi,

When using sensitive values in a variable map/object with the defaults function, a single key will render the full variable sensitive without being possible to use nonsensitive as the values are "known only after apply".

Terraform Version

Terraform v0.15.0-dev @ b3b6099
on darwin_amd64

Terraform Configuration Files

terraform {
  experiments = [module_variable_optional_attrs]
}

variable "myvar" {
  type = object({
    test_bool    = optional(bool)
    test_string1 = optional(string)
    test_string2 = optional(string)
  })
  default = {}
}

locals {
  myvar = defaults(var.myvar, {
    test_bool    = false
    test_string1 = sensitive("mystring")
    test_string2 = "mystring"
  })
}

output "test_string2" {
  value = local.myvar.test_string2
}
╷
│ Error: Output refers to sensitive values
│
│   on main.tf line 22:
│   22: output "test_string2" {
│
│ Expressions used in outputs can only refer to sensitive values if the sensitive attribute is true.
╵

Changing the output to nonsensitive

output "test_string2" {
  value = nonsensitive(local.myvar.test_string2)
}
│ Error: Invalid function argument
│
│   on main.tf line 23, in output "test_string2":
│   23:   value = nonsensitive(local.myvar.test_string2)
│     ├────────────────
│     │ local.myvar.test_string2 is a string, known only after apply
│
│ Invalid value for "value" parameter: the given value is not sensitive, so this call is redundant.

Set field to nonsensitive

locals {
  myvar = defaults(var.myvar, {
    test_bool    = false
    test_string1 = sensitive("mystring")
    test_string2 = nonsensitive("mystring2")
  })
}
│ Error: Invalid function argument
│
│   on main.tf line 18, in locals:
│   18:     test_string2 = nonsensitive("mystring2")
│
│ Invalid value for "value" parameter: the given value is not sensitive, so this call is redundant.

Debug Output

https://gist.github.com/horjulf/d8c5ca9c234c8342c82e8f4ed83f1259

Expected Behavior

Only sensitive fields should be marked as sensitive.

Actual Behavior

The full map/object is marked as sensitive without the possibility of unmarking.

Steps to Reproduce

  1. terraform init
  2. terraform plan
apparentlymart commented 3 years ago

Hi @horjulf,

I believe the design intent here matches what you are seeing: any value that is in part sensitive must be marked as fully sensitive when passing through an output value. That was an intentional rule, and so I think we ought to treat this issue the way we would typically treat a feature request, discussing the underlying use-cases to see if a change to the design is warranted.

The use-case I understand from your examples here is that you want to return a complex object which contains a sensitive value, but that also contains various data that isn't sensitive. One simple way I could imagine that happening is if you set an output value to an entire resource instance where the resource type schema contains a sensitive attribute, although we can see in your examples that this can also arise through intentionally constructing an object with a sensitive value inside of it.

I believe the intent of the current design is that having an output value marked statically as sensitive means that the sensitivity can be recognized in situations where Terraform isn't evaluating the entire configuration, which might include some special situations within Terraform itself (although I must admit I'm not thinking of any off the top of my head right now) or it might involve external tools which rely on the static annotations of sensitivity that predated Terraform having the current dynamic sensitivity feature.

With all of this said, it does seem reasonable to me that we might loosen the requirement to mark an output value as fully sensitive in order to return something partially-sensitive, but before we can do that we'll need to do some investigation about whether anything is relying on the static declarations of sensitivity and, if so, devise a new solution for that situation.

Since we'll be working with this issue as discussion about the tradeoffs of changing Terraform's design, rather than merely changing the implementation to better meet the design, I'm going to relabel this as an enhancement so that it'll go through our enhancement request process rather than our bug report process. Thanks!

Nuru commented 3 years ago

I have a similar situation/use case for you to consider.

Simplified:

Module "user" creates user with username, password, other info, and has outputs

output "user" {
  value       = local.user
  description = "user name"
}

output "password" {
  value       = local.password
  description = "user's password"
  sensitive   = true
}

output "db_user_password_ssm_key" {
  value       = local.password_key
  description = "SSM key under which user password is stored"
}

Root module creates a bunch of users, wants to output the non-sensitive information

module "user" {
  for_each = var.user_names
  source = "./modules/user"

  username = each.key
}

locals {
  # Create a map of non-sensitive outputs
  sanitized_users = { for k, v in module.user : k => { for kk, vv in v : kk => vv if kk != "password" } }
}

output "user_info" {
  value = local.sanitized_users
  description = "Information about users created"
}

Unfortunately, with Terraform 0.14.9 this outputs

  on read-example.tf line 31:
 31: output "user_info" {

Expressions used in outputs can only refer to sensitive values if the
sensitive attribute is true.
apparentlymart commented 3 years ago

Having dug a little deeper since my comment the other day, I can now say a little more about what this "the entire output value must be sensitive" rule is dealing with, or at least one thing it's aimed at. (there may be others)

When we save root module output values into the state, we're limited by the schema of the state to only have a single sensitive flag for the entire value. For example:

output "example" {
  value = {
    a = "hello"
    b = sensitive("world")
  }
}

Terraform will reject that with the error we saw in other comments here:

╷
│ Error: Output refers to sensitive values
│ 
│   on sensitive-in-state.tf line 2:
│    2: output "example" {
│ 
│ Expressions used in outputs can only refer to sensitive values
│ if the sensitive attribute is true.
╵

So I add sensitive = true as directed, and then of course we get the whole thing marked as sensitive as we've seen:

Changes to Outputs:
  + example = (sensitive value)

If we look in the now we can see how Terraform encodes that:

{
  "version": 4,
  "terraform_version": "0.15.0",
  "serial": 1,
  "lineage": "c04b4155-3b9d-3c39-4152-eb48bedbc5f5",
  "outputs": {
    "example": {
      "value": {
        "a": "hello",
        "b": "world"
      },
      "type": [
        "object",
        {
          "a": "string",
          "b": "string"
        }
      ],
      "sensitive": true
    }
  },
  "resources": []
}

There is no native idea of "sensitive" in JSON, and Terraform is already using the entire range of possible JSON values for other meanings and so there's no room for overloading something to mean "sensitive" here. And so, as a compromise, there's just that "sensitive": true annotation belonging to the entire output value, separate from it's value and type.

In theory this single sensitive property could be replaced with a more complex data structure that tries to encode which parts of the data structure are sensitive. Maybe like this:

{
  "sensitive_parts": {
    "b": true
  }
}

...but that's a significantly more complex design, and may not be fully sufficient to represent all cases (I've not looked deeply to prove or disprove that).

However, the better news is that if we can design something like that which is sufficient to represent the details then in principle we could also retain the separate "sensitive": true as a sort of "summary" of that more complex property, which would allow keeping the state format backward-compatible with earlier versions which might be consuming this data using terraform_remote_state, or similar.

I think we do still need to research this a little more to see what this might look like in the details, and so I appreciate the additional use-case here. Having more use-cases is helpful because they can prompt other ideas/compromises that we might not consider, or show us that the problem isn't as general as we feared, etc.

horjulf commented 3 years ago

Hi @apparentlymart,

Thank you for taking the time to explain the design logic.

My use case is more targeted at local objects and variables, I used the output example to keep it simple and demonstrate the limitation.

Lets take a look at:

terraform {
  experiments = [module_variable_optional_attrs]
}

locals {
  user = {
    username    = ""
    password    = sensitive("")
    description = ""
  }
  my_user = defaults(local.user, {
    username    = ""
    password    = ""
    description = ""
  })
}

resource "null_resource" "test" {
  triggers = {
    "username"    = local.user.username
    "password"    = local.user.password
    "description" = local.user.description
  }
}

resource "null_resource" "test2" {
  triggers = {
    "username"    = local.my_user.username
    "password"    = local.my_user.password
    "description" = local.my_user.description
  }
}
Terraform will perform the following actions:

  # null_resource.test will be created
  + resource "null_resource" "test" {
      + id       = (known after apply)
      + triggers = {
          + "description" = ""
          + "password"    = (sensitive)
          + "username"    = ""
        }
    }

  # null_resource.test2 will be created
  + resource "null_resource" "test2" {
      + id       = (known after apply)
      + triggers = {
          + "description" = (sensitive)
          + "password"    = (sensitive)
          + "username"    = (sensitive)
        }
    }

Plan: 2 to add, 0 to change, 0 to destroy.

We modified the object and it was marked as sensitive, not ideal but understandable security default, to work around it we can do:

resource "null_resource" "test2" {
  triggers = {
    "username"    = nonsensitive(local.my_user.username)
    "password"    = local.my_user.password
    "description" = nonsensitive(local.my_user.description)
  }
}

We get:

  # null_resource.test2 will be created
  + resource "null_resource" "test2" {
      + id       = (known after apply)
      + triggers = {
          + "description" = ""
          + "password"    = (sensitive)
          + "username"    = ""
        }
    }

So far all good, sensitive propagates on data manipulation which is a good security default, and we can disable it if necessary. However it all breaks once we introduce variables and modules:

module "test" {
  source = "./module"

  my_user = {
    username = "test"
    password = sensitive("random")
    description = "my user"
  }
}

# module/main.tf
terraform {
  experiments = [module_variable_optional_attrs]
}

variable "my_user" {
  type = object({
    username    = optional(string)
    password    = optional(string)
    description = optional(string)
  })
  default = {}
}

locals {
  my_user = defaults(var.my_user, {
    username    = ""
    password    = ""
    description = ""
  })
}

resource "null_resource" "test" {
  triggers = {
    "username"    = var.my_user.username
    "password"    = var.my_user.password
    "description" = var.my_user.description
  }
}

resource "null_resource" "test2" {
  triggers = {
    "username"    = local.my_user.username
    "password"    = local.my_user.password
    "description" = local.my_user.description
  }
}

Same as before, we get the object marked as sensitive due to the manipulation.

  # module.test.null_resource.test will be created
  + resource "null_resource" "test" {
      + id       = (known after apply)
      + triggers = {
          + "description" = "my user"
          + "password"    = (sensitive)
          + "username"    = "test"
        }
    }

  # module.test.null_resource.test2 will be created
  + resource "null_resource" "test2" {
      + id       = (known after apply)
      + triggers = {
          + "description" = (sensitive)
          + "password"    = (sensitive)
          + "username"    = (sensitive)
        }
    }

Lets try marking it as nonsensitive:

resource "null_resource" "test2" {
  triggers = {
    "username"    = nonsensitive(local.my_user.username)
    "password"    = local.my_user.password
    "description" = nonsensitive(local.my_user.description)
  }
}

We get:

│ Error: Invalid function argument
│
│   on module/main.tf line 32, in resource "null_resource" "test2":
│   32:     "username"    = nonsensitive(local.my_user.username)
│     ├────────────────
│     │ local.my_user.username is a string, known only after apply
│
│ Invalid value for "value" parameter: the given value is not sensitive, so this call is redundant.
╵
╷
│ Error: Invalid function argument
│
│   on module/main.tf line 34, in resource "null_resource" "test2":
│   34:     "description" = nonsensitive(local.my_user.description)
│     ├────────────────
│     │ local.my_user.description is a string, known only after apply
│
│ Invalid value for "value" parameter: the given value is not sensitive, so this call is redundant.

I guess the perfect solution would be to expand the sensitive attribute to individual fields as you mentioned by using something like sensitive_parts.

However since that brings a lot of complexity to the state and overall TF logic, we can start by addressing the uniformity of the nonsensitive function, so that we can use it to workaround these cases.

rafabu commented 3 years ago

I have hit the same issue with our existing, rather complex code base. It is making much use of mapping the output exactly as @apparentlymart described:

One simple way I could imagine that happening is if you set an output value to an entire resource instance where the resource type schema contains a sensitive attribute...

In our case it will be a collection of resource instances generated by dynamic for_each-ed resource and/or modules.

Long term we might have to consider to refactor the entire IaC engine to make it work cleanly with attributes having set the Sensitive label; unless something likesensitive_parts as suggested by @horjulf would be implemented in the future.

Short term it would be helpful if we could indeed loosen the restriction on the nonsensitive function such that it allowed 'cleaning' an entire collection of objects that had (potentially) secret attributes down inside its hierarchy. Or maybe a general flag that loosened the entire terraform treatment of sensitivity back to to pre-0.15 behaviour (with a warning message obviously). Setting a flag on a specific provider to suppress/filter all sensitivity flags could also be an option.

krowlandson commented 3 years ago

We are also facing this issue and feel that despite the good intent of this feature, the resultant behaviour is not ideal.

In our case, we are developing a module which deploys many different resource types all using for_each loops to allow multiple instances of each resource to be created as needed.

To ensure the full configuration of each resource created by the module is available within the root module, we output the configuration of each resource, grouped by type and name. This makes the output of the module look like a native resource object would be handled.

Whilst testing the module against release v0.15.0, we found that simply generating an output equivalent to the following results in us have to mark the entire object sensitive:

output "azurerm_log_analytics_workspace" {
  value = {
    resource_name = azurerm_log_analytics_workspace.resource_name
  }
  description = "Returns the configuration data for all Log Analytics workspaces created by this module."
  sensitive   = true
}

This is far from ideal, as 2 out of the 4 attributes returned by this resource type are non-sensitive but we can no longer view them in the output.

Due to the number of resource types we handle, combined with number of instances for each, it's not practical to decompose the outputs into sensitive and non-sensitive values. This would also break our design principle for handling resource outputs.

This problem is also made worse if we try to combine all outputs from the module into a single object within the root module (to simplify parsing the child-module output to root-module output) as this then also needs to be marked as sensitive, rendering it useless for debugging.

opub commented 3 years ago

I ran into this in case where I wanted to output a sensitive azurerm_storage_account.primary_access_key during apply from a set of resources that were "known only after apply".

With Terraform 0.14 I could just do this.

output "azure_access_keys" {
  description = "The primary access keys for the storage accounts."
  value       = azurerm_storage_account.accounts[*].primary_access_key
}

With Terraform 0.15 I couldn't mark it as nonsensitive because that failed as redundant but I also couldn't just output it without marking sensitive = true. My hack to get around this to get back to the original functionality was to convert the value to string and then back. While it is a horrible long term solution I just needed it for this one case.

output "azure_access_keys" {
  description = "The primary access keys for the storage accounts."
  value       = nonsensitive(jsondecode(jsonencode(azurerm_storage_account.accounts[*].primary_access_key)))
}
enver commented 2 years ago

Any update on this issue?

apparentlymart commented 2 years ago

This issue started about the defaults function in particular, and that function was experimental at the time and has now been removed as part of changing the design of that experiment in response to feedback.

So at least the original issue is in a sense resolved now -- the new mechanism for dealing with defaults is better integrated and so should handle sensitive values better.

There is no change to the rule that a root module output value must be marked wholly as sensitive, though. Since my earlier comments I have also learned an additional requirement that motivated the current design, which me must therefore also either preserve or refine: because root module output values are a point of explicit data sharing between a Terraform configuration and external systems, customer feedback during the design phase suggested that some users were worried about accidentally exposing a sensitive value in this way, since the value would still be available for use elsewhere even though Terraform hides it in its own UI. This requirement is interesting because it assumes Terraform is acting as a part of a larger system that is able to "keep secrets" inside the state but still publish the output values separately on the assumption that they are explicitly intended for sharing. In that sense, Terraform itself is really only meeting half of the overall requirement, and delegating to whatever is wrapping it to implement some kind of execution isolation. Terraform's contribution to that need is covered by our v1.0 compatibility promises for "running Terraform in Automation" purposes, so we can only change it in ways that don't break the guarantee that a sensitive value cannot end up in the output values portion of the state without some sort of explicit opt-in by the author of the root module. However, we can consider different formulations of that opt-in as long as the annotation is always attached to the output value configuration itself so that it's clearly visible at the interface boundary between the Terraform configuration and the external consumers of its output values.

So far there isn't a clear design for how to meet all of these needs beyond what Terraform is already doing, so the next step would be to iterate towards a viable design and evaluate it against the constraints I've described in my various comments in this issue. Nobody is working directly on this at the moment, as far as I know.

DeSmetMaarten commented 2 years ago

when using outputs - depends_on you cannot use this anymore as it's not able to output sensitive data. my (inherited) project) does output the created module but this is not possible anymore. will it ever be possible again? What is best practice on this setup?

depends_on — Explicit Output Dependencies Since output values are just a means for passing data out of a module, it is usually not necessary to worry about their relationships with other nodes in the dependency graph.

However, when a parent module accesses an output value exported by one of its child modules, the dependencies of that output value allow Terraform to correctly determine the dependencies between resources defined in different modules.

Just as with resource dependencies, Terraform analyzes the value expression for an output value and automatically determines a set of dependencies, but in less-common cases there are dependencies that cannot be recognized implicitly. In these rare cases, the depends_on argument can be used to create additional explicit dependencies:

Nuru commented 2 years ago

@apparentlymart Without delving into the pros and cons of marking the whole object sensitive if any of the members are sensitive, can we at least get nonsensitive() to operate on an object, so we can output the values if we want to? See #31693 for details.

sachinchemmala commented 1 year ago

Unfortunately, with Terraform 0.14.9 this outputs

  on read-example.tf line 31:
 31: output "user_info" {

Expressions used in outputs can only refer to sensitive values if the
sensitive attribute is true.

I had a similar scenario where I could not print the RDS module output because of the master_username sensitive value. I tweaked your logic like below and it worked for me.

output "cluster" {
  value       = { for k, v in module.rds_cluster_aurora_postgres.* : k => { for kk, vv in v : kk => vv if kk != "master_username" } }[0]
  description = "RDS cluster outputs"
}
ccq1994 commented 1 year ago

Wanted to share that I worked around this. The issue is the mix of sensitive and nonsensitive values being grouped together. Note in the root module, the key is plain, but the value has the nonsensitive() function around it.

From the module that's creating the resources:

output "map" {
  value       = tomap({
    for p in aws_ssm_parameter.default : p.name => p.value
  })
}

In the root module:

output "parameters" {
  description = "List of parameters"
  value       = {for k, v in module.module_name.map : k => nonsensitive(v)}
}
michaelruigrok commented 1 year ago

Upon upgrading to Terraform 1.4, our modules are triggering the error sensitive values, or values derived from sensitive values, cannot be used as for_each arguments.. The values in for_each are not sensitive, but they are provided by an attribute of a sensitive output object. This output is far upstream and updating all modules using that output is a hairy proposition.

It would be very useful to have a simple mechanism to mark any value as non-sensitive in order to override this error when the given for_each variables are clearly not sensitive. It seems necessary if we're also checking if values are even derived from a sensitive object. #31693 may be relevant here.