A method to mock data sources for testing

mpkuth commented 1 year ago

Terraform Version

Terraform v1.4.6
on windows_amd64
+ provider registry.terraform.io/hashicorp/azurerm v3.69.0
+ provider registry.terraform.io/hashicorp/helm v2.8.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.16.1
+ provider registry.terraform.io/hashicorp/random v3.5.1

Your version of Terraform is out of date! The latest version
is 1.5.7. You can update by downloading from https://www.terraform.io/downloads.html

Use Cases

We would like to be able to optimistically validate and plan modules that use the terraform_remote_state data source when the remote state may not contain some outputs yet. Once the plans look good we then want to apply the stacks in order without using any default values.

Add new_output to A, then validate and plan A Add usage of A.new_output to B, then validate and plan B (using mock/default values for A.new_output) If all looks good, apply A, then apply B (NOT using mock/default values for A.new_output)

Attempted Solutions

We've tried specifying defaults, but haven't found a way to tell terraform that they should only be used for validate and plan operations and NOT be used for apply operations.

Proposal

Add an optional argument to the terraform_remote_state data source called something like defaults_operations.

If not provided it defaults to all operations.
If provided, it is a list of operations like ["validate", "plan"].

Defaults are only used when terraform is running the operations specified by that field.

References

12316
8001

jbardin commented 1 year ago

Hi @mpkuth,

Can you give an example configuration and steps showing what you are trying to do? It's not clear what it would mean for a data source to do something only during plan and not during apply. What happens in apply is just the execution of what was recorded in the plan, and in most cases a data source is not read at all during apply.

mpkuth commented 1 year ago

Hello @jbardin!

We'd like to write something like:

Module A

output "a" {
  value = "real"
}

output "b" {
   value = "real"
}

output "c" {
  value = 1
}

output "d" {
  value = ["real"]
}

Module B

data "terraform_remote_state" "module_a" {
  ...

  defaults = {
    a = "mock"
    b = "mock"
    c = -1
    d = ["mock"]
  }

  defaults_operations = ["validate", "plan"]
}

Example 1: We want to deploy a new environment that includes both Module A and Module B. We'd like to be able to validate and plan all of the modules in that environment before starting to apply anything, but we cannot right now because validation and planning of Module B will fail because there is no remote state for Module A yet.

Example 2: We want to deploy a change to an existing environment that adds a new output "e" to Module A and uses it in Module B. We'd like to be able to validate and plan the changes for both modules before starting to apply anything, but we cannot right now because validation and planning of Module B will fail because the new output "e" is not in the remote state for Module A yet.

We think this is the only missing piece preventing us from moving away from a similar feature in terragrunt (generally preferring built-in capabilities of terraform where possible): https://terragrunt.gruntwork.io/docs/features/execute-terraform-commands-on-multiple-modules-at-once/#unapplied-dependency-and-mock-outputs. That also explains the use case in detail.

As I think about this more the defaults field could/should remain a separate concept and so the proposal would be to add a new mocks field and a correlated mock_operations field that controls when they are used (which defaults to none of them).

At the risk of complicating the request, it would be interesting to explore the options for just having a mock_outputs field that takes a list of operations (which defaults to none of them) and then includes a well-known indicator like "(mock remote state value)" in the plan output (similar to the existing "(known after apply)" indicator). Or something like that which achieves the same thing but doesn't require setting mock outputs for every output manually, which is one of the main gripes we have with the terragrunt implementation of this feature.

What happens in apply is just the execution of what was recorded in the plan, and in most cases a data source is not read at all during apply.

There could be a flag in the plan that says "I was generated using mock values and cannot be applied" and the "apply" operation just checks that and errors out with a helpful message if it is set? Or checks for the well-known "(mock remote state value)" in the plan and fails if it finds it if the feature is implemented that way. I think that would handle the case of a applying a pre-saved plan as well.

jbardin commented 1 year ago

Thanks for the extra information @mpkuth! I think there's some confusion in terminology here, which makes the scope harder to pin down. In Terraform, "validation" is entirely offline, so validation cannot fail because of the data in the remote state, because the remote state is never read. It may also not be clear that terraform_remote_state is a data source just like any other, and follows the same lifecycle rules as every other data source. Other than validating its configuration, only action it has is ReadDataSource, which usually happens during plan, but can also be deferred to apply if necessitated by the configuration.

Since there doesn't appear to be anything which terraform_remote_state could do to solve the problem here on its own, I think we could better classify this as a request for testing or mocking of resources in general.

As for workaround, I'd have to take some time to study what terragrunt is doing there, but it seems like you could test your plans with some configuration overrides and fake remote state staged specifically for the tests.

mpkuth commented 1 year ago

Thanks for the quick responses and information about validation. I must have misconfigured something else when trying this out in our project. After your comment I implemented an independent set of modules with the minimum configuration to demonstrate the use case and confirmed that does work in these cases.

module-a

config.tf

terraform {
  backend "s3" {
    bucket = "kuth"
    key    = "remote-state-test/module-a"
    region = "us-west-2"
  }
}

outputs.tf

output "foo" {
  value = "bar"
}

module-b

config.tf

terraform {
  backend "s3" {
    bucket = "kuth"
    key    = "remote-state-test/module-b"
    region = "us-west-2"
  }

  required_providers {
    local = {
      source  = "hashicorp/local"
      version = "2.4.0"
    }
  }
}

main.tf

data "terraform_remote_state" "module_a" {
  backend = "s3"
  config = {
    bucket = "kuth"
    key    = "remote-state-test/module-a"
    region = "us-west-2"
  }
}

resource "local_file" "test" {
  filename = "test.txt"
  content  = "TEST"
}

resource "local_file" "foo" {
  filename = "foo.txt"
  content  = data.terraform_remote_state.module_a.outputs.foo
}

outputs.tf

output "foo" {
  value = data.terraform_remote_state.module_a.outputs.foo
}

Case 1: Planning before the module that we depend on has been deployed at all

$ terraform plan
data.terraform_remote_state.module_a: Reading...

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform planned the following actions, but then encountered a problem:

  # local_file.test will be created
  + resource "local_file" "test" {
      + content              = "TEST"
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + directory_permission = "0777"
      + file_permission      = "0777"
      + filename             = "test.txt"
      + id                   = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
╷
│ Error: Unable to find remote state
│
│   with data.terraform_remote_state.module_a,
│   on main.tf line 1, in data "terraform_remote_state" "module_a":
│    1: data "terraform_remote_state" "module_a" {
│
│ No stored state was found for the given workspace in the given backend.

We'd like to be able to see what the plan would look like if the missing values were known.

Even if we add the following to the terraform_remote_state data source we still see the same error.

  defaults = {
    foo = "missing"
  }

Case 2: The dependency has already been deployed but we're adding a new output to it for use in the dependent module.

$ terraform plan
data.terraform_remote_state.module_a: Reading...
data.terraform_remote_state.module_a: Read complete after 1s

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform planned the following actions, but then encountered a problem:

  # local_file.test will be created
  + resource "local_file" "test" {
      + content              = "TEST"
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + directory_permission = "0777"
      + file_permission      = "0777"
      + filename             = "test.txt"
      + id                   = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
╷
│ Error: Unsupported attribute
│
│   on main.tf line 17, in resource "local_file" "foo":
│   17:   content  = data.terraform_remote_state.module_a.outputs.foo
│     ├────────────────
│     │ data.terraform_remote_state.module_a.outputs is object with no attributes
│
│ This object does not have an attribute named "foo".
╵
╷
│ Error: Unsupported attribute
│
│   on outputs.tf line 2, in output "foo":
│    2:   value = data.terraform_remote_state.module_a.outputs.foo
│     ├────────────────
│     │ data.terraform_remote_state.module_a.outputs is object with no attributes
│
│ This object does not have an attribute named "foo".
╵

In this case, adding the following to the terraform_remote_state data source will allow us to see what we want (a complete plan with dummy values for missing remote state outputs).

  defaults = {
    foo = "missing"
  }

$ terraform plan
data.terraform_remote_state.module_a: Reading...
data.terraform_remote_state.module_a: Read complete after 1s

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # local_file.foo will be created
  + resource "local_file" "foo" {
      + content              = "missing"
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + directory_permission = "0777"
      + file_permission      = "0777"
      + filename             = "foo.txt"
      + id                   = (known after apply)
    }

  # local_file.test will be created
  + resource "local_file" "test" {
      + content              = "TEST"
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + directory_permission = "0777"
      + file_permission      = "0777"
      + filename             = "test.txt"
      + id                   = (known after apply)
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + foo = "missing"

However, as far as I know we cannot prevent applying a plan that was generated using any missing values. We'd prefer an option that results in an error if that is attempted. If this pre-planning/dry-run ability was implemented as part of a "test" plan and terraform wouldn't let you apply "test" plans I think that would meet the need. But I do still wonder if it could just be a flag on the terragrunt_remote_state data source that lets the plan complete with the default values but then raises an error that makes the plan invalid and uses the "Terraform planned the following actions, but then encountered a problem:" output?

Anyway, I appreciate the time and consideration. Just wanted to confirm that I do see the validation behavior you mentioned and provide a better example of what we're hoping to do natively. We can continue to use terragrunt for this until if/when something similar makes its way into terraform.

omarismail commented 9 months ago

@mpkuth have you looked at the new test framework that includes mocking? Does that capability satisfy your need here?

hashicorp / terraform