Inconsistent plan when introducing dependent object to output attribute while updating it

rucciva commented 3 years ago

consider this initial resource

$ terraform apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # shell_script.script will be created
  + resource "shell_script" "script" {
      + dirty             = false
      + environment       = {
          + "CONTENT"      = jsonencode(
                {
                  + test = true
                }
            )
          + "CONTENT_PATH" = "./test-shell-provider"
        }
      + id                = (known after apply)
      + interpreter       = [
          + "/bin/bash",
          + "-c",
        ]
      + output            = (known after apply)
      + working_directory = "."

      + lifecycle_commands {
          + create = "echo -n $CONTENT > $CONTENT_PATH"
          + delete = "rm $CONTENT_PATH"
          + read   = "cat $CONTENT_PATH"
          + update = "echo -n $CONTENT > $CONTENT_PATH"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

shell_script.script: Creating...
shell_script.script: Creation complete after 0s [id=bun0t2i3k1k6j83s7tqg]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

If we were to introduce dependent object which depend on the output value while at the same time updating the shell_script resource that affect output value only after apply, it will produce inconsistent plan

$ terraform apply
shell_script.script: Refreshing state... [id=bun0t2i3k1k6j83s7tqg]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # null_resource.shell_output will be created
  + resource "null_resource" "shell_output" {
      + id       = (known after apply)
      + triggers = {
          + "output" = "true"
        }
    }

  # shell_script.script will be updated in-place
  ~ resource "shell_script" "script" {
        dirty             = false
      ~ environment       = {
          ~ "CONTENT"      = jsonencode(
              ~ {
                  ~ test = true -> false
                }
            )
            "CONTENT_PATH" = "./test-shell-provider"
        }
        id                = "bun0t2i3k1k6j83s7tqg"
        interpreter       = [
            "/bin/bash",
            "-c",
        ]
        output            = {
            "test" = "true"
        }
        working_directory = "."

        lifecycle_commands {
            create = "echo -n $CONTENT > $CONTENT_PATH"
            delete = "rm $CONTENT_PATH"
            read   = "cat $CONTENT_PATH"
            update = "echo -n $CONTENT > $CONTENT_PATH"
        }
    }

Plan: 1 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

shell_script.script: Modifying... [id=bun0t2i3k1k6j83s7tqg]
shell_script.script: Modifications complete after 0s [id=bun0t2i3k1k6j83s7tqg]

Error: Provider produced inconsistent final plan

When expanding the plan for null_resource.shell_output to include new values
learned so far during apply, provider "registry.terraform.io/hashicorp/null"
produced an invalid new value for .triggers["output"]: was
cty.StringVal("true"), but now cty.StringVal("false").

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

Though this is error is retriable and next terraform apply will produce no issue

$ terraform apply
shell_script.script: Refreshing state... [id=bun0t2i3k1k6j83s7tqg]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # null_resource.shell_output will be created
  + resource "null_resource" "shell_output" {
      + id       = (known after apply)
      + triggers = {
          + "output" = "false"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

null_resource.shell_output: Creating...
null_resource.shell_output: Creation complete after 0s [id=5647154692069527402]

This is fixable if we invoke SetNewComputed whenever input changes or read commands change (and we also need to run the read commands instead of not executing any commands when its updated). But this will always trigger recreation/update to dependent objects even tough after apply the output was not modified

What do you think? should this considered as bug?

scottwinkler commented 3 years ago

Yes this should be considered a bug. I've known about this error message for a while but I didn't know how to fix it.

arielbeckjit commented 1 year ago

@rucciva Hi, i don't know if you still remember this, but it look like this PR added a bug - the simple case of updating any of the scripts (let's say i need to update for any reason) - it doesn't trigger the update itself. this is because there's a change in lifecycle_commands

    if d.HasChanges("lifecycle_commands", "interpreter") {
        return
    }

Why? How should one trigger an update then?

rucciva commented 1 year ago

Hi @arielbeckjit , IIRC, you need to apply twice. The first apply should only update the "logic" of the script (lifecycle_commands, interpreter). The second apply will use the new logic against the other attributes and terraform state.

And IIRC, i added conditions to prevent update of the logic part and other part at the same time.

arielbeckjit commented 1 year ago

Hi @rucciva i've tested it and it doesn't work (applying twice). you can try it yourself: https://github.com/scottwinkler/terraform-provider-shell/issues/124

there's another big issue (atleast for me), is that i'm using sensitive_variables that have a temporary password that i'm using, so then if the temp password change, i cannot do changes in the script aswell (Because it clashes with this PR). https://github.com/scottwinkler/terraform-provider-shell/blob/ef2552de102f6afa959b2e6d448d9aec86fcf1a0/shell/resource_shell_script.go#L145

resource "shell_script" "configure-data-api" {
  environment = {
    PROJECT_ID                = var.project_id
  }
  sensitive_environment = {
    AUTH = var.mongo_creds
  }
  lifecycle_commands {
    create = file("${path.module}/configure_data_api_scripts/create_update.sh")
    update = file("${path.module}/configure_data_api_scripts/create_update.sh")
    read   = file("${path.module}/configure_data_api_scripts/get.sh")
    delete = file("${path.module}/configure_data_api_scripts/delete.sh")
  }
  interpreter = ["/bin/bash", "-c"]

  lifecycle {
    ignore_changes = [
      sensitive_environment
    ]
  }

  working_directory = path.module
}

arielbeckjit commented 1 year ago

@rucciva i'll add why the double apply doesn't really work. at first apply - the create does nothing, but the read part updates the state with the expected result (then new updated result). so in further applies the script think it's aligned, even though nothing was applied and only the state was changed.

to combat that, i need to manually do change in the env, so the state will be not aligned, then do another apply.

rucciva commented 1 year ago

Hi @arielbeckjit

to combat that, i need to manually do change in the env, so the state will be not aligned, then do another apply.

This is what i mean by apply twice. In this PR, i made it so that change to the script "logic" (lifecycle_commands or interpreter IIRC) can't be done at the same time with changes to another attribute that is considered as "data" that will be processed by said "logic".

So when you need to have updates to the logic part and the data part, you need to first update the logic part only, and then apply, and then update the data part, and then apply again.

Its actually the same when you need to update a terraform provider version and the resources at the same time. Terraform will first update the provider version which will update the "logic" part, then it will use those new logic to read/create/update/delete the resource.

So in the end, changes to "logic" should not be done at the same time with the "data". Or else it will produce inconsistent state as i pointed in this issue, which is what considered as a bug by terraform. I just created a guard against this.

arielbeckjit commented 1 year ago

What do you mean by "data" part? those are the environment variables for example? if so, then the behavior is expected indeed, though i'm not familiar with many resources that require double apply..

so - if i got you correct, the right logic is to:

change the scripts (logic) - perform apply. expected: environment will not change, but state will change
either add some kind of dummy environment variable so it will trigger the actual change, or go to the real resource in the environment and change it manually, then apply?

@rucciva

rucciva commented 1 year ago

What do you mean by "data" part? those are the environment variables for example?

Attributes other than lifecycle_commands and interpreter

though i'm not familiar with many resources that require double apply..

Because terraform resources is normally only contains state, so data only. This provider meanwhile includes both data and logic on the state.

The right steps when you need to change both the logic and the data part is

Change only the logic part
Apply
Change anything other than than logic part, in other words anything on the data part
Apply

scottwinkler / terraform-provider-shell

Inconsistent plan when introducing dependent object to output attribute while updating it #81