gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.08k stars 1.72k forks source link

Terraform provider resource import should import metadata #42451

Open programmerq opened 2 years ago

programmerq commented 2 years ago

When importing a role to the terraform state, only the very bare minimum that can be imported actually makes it to the terraform.tfstate file. This means that after import, terraform plan believes it needs to recreate the role because it doesn't have a reference to the role name in the state file. Recreating the role doesn't always work because teleport can refuse to delete a role if a user exists that references it, which prevents terraform from completing altogether.

Basically, the ask is for the terraform import for all of the resource types in this provider to import the associated metadata so the state file will accurately reflect the state of the imported resource.

The following example is for a role, but terraform import should grab the remote state for any teleport resource that is imported:


Considering the following existing role and terraform resource block:

version: v4
kind: role
metadata:
  labels:
    example: "yes"
  name: example
spec:
  allow:
    logins:
    - example
  deny:
    logins:
    - anonymous
resource "teleport_role" "example" {
  metadata {
    name        = "example"
    description = "Example Teleport Role"
    labels = {
      example  = "yes"
    }
  }

  spec {
    options {
      forward_agent           = false
      max_session_ttl         = "7m"
      port_forwarding         = false
      client_idle_timeout     = "1h"
      disconnect_expired_cert = true
      permit_x11_forwarding    = false
      request_access          = "denied"
    }

    allow {
      logins = ["example"]

      rules {
        resources = ["user", "role"]
        verbs = ["list"]
      }

      request {
        roles = ["example"]
        claims_to_roles {
          claim = "example"
          value = "example"
          roles = ["example"]
        }
      }

      app_labels {
         key = "*"
         value = ["*"]
      }

      app_labels {
         key = "a"
         value = ["a"]
      }

      node_labels {
         key = "example"
         value = ["yes"]
      }

    }

    deny {
      logins = ["anonymous"]
    }
  }
}

Running terraform import teleport_role.example example results in the following tfstate json:

{
  "version": 4,
  "terraform_version": "1.1.0",
  "serial": 1,
  "lineage": "d23102ce-2969-5c09-624c-754e1052ee4e",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "teleport_role",
      "name": "example",
      "provider": "provider[\"gravitational.com/teleport/teleport\"]",
      "instances": [
        {
          "schema_version": 4,
          "attributes": {
            "id": "example",
            "kind": "role",
            "metadata": [],
            "spec": [],
            "sub_kind": null,
            "version": "v4"
          },
          "sensitive_attributes": [],
          "private": "eyJzY2hlbWFfdmVyc2lvbiI6IjQifQ=="
        }
      ]
    }
  ]
}

The .resources.instances[0].attributes.metadata value is simply an empty list. This means that terraform plan indicates that terraform must recreate the remote resource because it believes it's going from no name to a name:

$ terraform plan
...
  # teleport_role.example must be replaced
-/+ resource "teleport_role" "example" {
      ~ id      = "example" -> (known after apply)
      ~ kind    = "role" -> (known after apply)
      ~ version = "v4" -> (known after apply)

      + metadata {
          + description = "Example Teleport Role"
          + labels      = {
              + "example" = "yes"
            }
          + name        = "example" # forces replacement
          + namespace   = (known after apply)
        }
...

This is incorrect and unnecessary. If I manually edit the terraform.tfstate to include at least the metadata.name="example" in the existing role, then terraform plan is able to properly recognize that it doesn't need to delete and then recreate the role, but instead only need to update it in place:

terraform.tfstate manually updated:

{
  "version": 4,
  "terraform_version": "1.1.0",
  "serial": 1,
  "lineage": "d23102ce-2969-5c09-624c-754e1052ee4e",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "teleport_role",
      "name": "example",
      "provider": "provider[\"gravitational.com/teleport/teleport\"]",
      "instances": [
        {
          "schema_version": 4,
          "attributes": {
            "id": "example",
            "kind": "role",
            "metadata": [
              {
                "name": "example"
              }
            ],
            "spec": [],
            "sub_kind": null,
            "version": "v4"
          },
          "sensitive_attributes": [],
          "private": "eyJzY2hlbWFfdmVyc2lvbiI6IjQifQ=="
        }
      ]
    }
  ]
}
$ terraform plan
...
  # teleport_role.example will be updated in-place
  ~ resource "teleport_role" "example" {
        id      = "example"
        # (2 unchanged attributes hidden)

      ~ metadata {
          + description = "Example Teleport Role"
          ~ labels      = {
              + "example" = "yes"
            }
            name        = "example"
            # (1 unchanged attribute hidden)
        }
...

Terraform still isn't aware of the other existing attributes on the existing resource, but it at least doesn't have the idea that it must recreate the role. The desirable behavior would be for all the remote values that can be grabbed to be included.

gz#3726

gz#4946

gz#5013

programmerq commented 2 years ago

Additionally, when trying to import and then apply this change when terraform does the delete and recreate, I would occasionally see where terraform reports that it successfully deleted the old role, and then immediate fails to create one with the same name because it already exists. I was able to make this behavior happen slightly more frequently when I increased the latency between the teleport auth server and my etcd backend (tc qdisc add dev eth0 root netem delay 4000ms in my pod with my single etcd node in my test lab. The higher latency seemed to make it more likely to run into the race condition.)

I wasn't able to make this same behavior fail in a case where an existing resource is renamed and therefore must be created, so it seems more likely that it is related to this delete/recreate mechanism that is present when recreating an imported resource that has had this type of incomplete resource import.

deusxanima commented 2 years ago

Another customer hitting this issue with applications where application gets added on the teleport side but is not propagated to the state file and terraform believes the deploy failed, thus going through the same recreate process @programmerq described above.

russjones commented 2 years ago

@Aharic This should be fixed 9.0.2, can you retry and get back to us?