hashicorp / terraform-provider-kubernetes

Terraform Kubernetes provider
https://www.terraform.io/docs/providers/kubernetes/
Mozilla Public License 2.0
1.59k stars 971 forks source link

Daemonset wait_for_rollout does not appear to have any impact. #2092

Open pidydx opened 1 year ago

pidydx commented 1 year ago

Terraform Version, Provider Version and Kubernetes Version

Terraform version: Terraform v1.4.5
Kubernetes provider version: 2.20.0
Kubernetes version: 1.24.11

Affected Resource(s)

Steps to Reproduce

  1. terraform apply

Expected Behavior

What should have happened? A kubernetes_daemonset should wait on creation/update until the pods are ready before allowing terraform to procede

Actual Behavior

What actually happened? Create/Update is instant and terraform moves on to any dependent resources without waiting.

Important Factoids

References

These issues: https://github.com/hashicorp/terraform-provider-kubernetes/issues/919 https://github.com/hashicorp/terraform-provider-kubernetes/pull/1053

And this documentation: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/daemonset#wait_for_rollout

All indicate that kubernetes_daemonset wait_for_rollout should do something.

Community Note

matthi-g commented 1 year ago

I noticed the same problem. My current workaround is to use a manifest instead and define the wait block with wait { rollout = true }

The problem with that approach is that you need permissions to list custom resource definitions. I hope either the necessity for CRDs in the manifest or the bug with the wait_for_rollout get fixed soon.

For reference of the manaifest CRD permissions:

1665

BBBmau commented 1 year ago

Hello, thank you for opening this issue @matthi-g. Could you provide us with a tfconfig that reproduces this issue?

matthi-g commented 1 year ago

Hi @BBBmau, the goal of my daemonset is to pre pull images and keep them at least until the daemon set is deleted. The first (buggy) version of my implementation uses the daemon_set_v1 resource: daemon_set_v1.txt

The second version does the same thing but instead makes use of the manifest resource: manifest.txt

Creation of the daemonset_v1 finishes after 0s but observing the state of the created pods on the k8s node indicates that the pods are not up and in running state. The creation of manifest finishes after 28 s (depends on the used images and bandwidth) and this seems to line up with the state of the pods.

pidydx commented 1 year ago

@BBBmau I don't have an example on hand, but as noted above daemonsets complete instantly and don't wait for the nodes to be ready. Without diving into the code to see how the check is supposed to be happening, my guess is that the Daemonset check is not validating that the number pod pods running is equal to the number of current nodes rather than checking if the cluster is simply ready to deploy the pods to any node that comes online.

sbocinec commented 8 months ago

@BBBmau I confirm, the issue is valid and still present. Here is a short reproducer:

terraform {
  required_version = "~> 1.7"
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.25.2"
    }
  }
}

# This should be configured individually depending in your setup
# the locals used here are omitted in the reproducer
provider "kubernetes" {
  host                   = local.cluster_config.cluster_endpoint
  cluster_ca_certificate = base64decode(local.cluster_config.cluster_ca)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    args        = ["eks", "get-token", "--cluster-name", local.cluster_config.cluster_name]
    command     = "awscli2"
  }
}

resource "kubernetes_daemon_set_v1" "i-am-not-waiting-for-rollout" {
  metadata {
    name      = "i-am-not-waiting-for-rollout"
    namespace = "default"
  }

  spec {
    selector {
      match_labels = {
        name = "i-am-not-waiting-for-rollout"
      }
    }
    template {
      metadata {
        labels = {
          name = "i-am-not-waiting-for-rollout"
        }
      }
      spec {
        container {
          name              = "pause"
          image             = "public.ecr.aws/eks-distro/kubernetes/pause:3.9@sha256:4668ea92d0ce9b5e5d8e84a4d3875d99aea7892e136a873425095f60bc22c49a"
          image_pull_policy = "IfNotPresent"
        }
        host_pid = true
        init_container {
          name              = "tuner"
          args = [
            "sysctl fs.inotify.max_user_instances=518"
          ]
          command = [
            "/bin/chroot",
            "/host",
            "/bin/bash",
            "-c",
            "--",
          ]
          image             = "public.ecr.aws/docker/library/busybox:1.36-glibc@sha256:e046063223f7eaafbfbc026aa3954d9a31b9f1053ba5db04a4f1fdc97abd8963"
          image_pull_policy = "IfNotPresent"
          security_context {
            capabilities {
              add = [
                "SYS_CHROOT",
              ]
              drop = [
                "all",
              ]
            }
            privileged = true
          }
          volume_mount {
            mount_path        = "/host"
            mount_propagation = "Bidirectional"
            name              = "hostfs"
          }
        }
        volume {
          host_path {
            path = "/"
          }
          name = "hostfs"
        }
      }
    }
  }
  wait_for_rollout = true
}

Initial apply is instant:

  Enter a value: yes                                                                                                           

kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Creating...                                                             
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Creation complete after 1s [id=default/i-am-not-waiting-for-rollout]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.                                                                    

Any further change is applied instantly, without waiting for the DS to rollout new pods.

  Enter a value: yes                                                                                                           

kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Modifying... [id=default/i-am-not-waiting-for-rollout]                  
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Modifications complete after 2s [id=default/i-am-not-waiting-for-rollout]                                                                                                                              

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.  

:warning: As others mentioned, workaround is to use kubernetes_manifest resource, but if you define a DaemonSet and trigger any change, it also fails as there is a bug with computed fields not being respected https://github.com/hashicorp/terraform-provider-kubernetes/issues/1591 . Currently, there is no way to manage DaemonSets with wait_for_rollout = true in this provider.

sbocinec commented 8 months ago

I have identified the issue and trying to prepare a fix https://github.com/hashicorp/terraform-provider-kubernetes/pull/2419 - need to test the fix yet.

sbocinec commented 8 months ago

I have meanwhile tested the PR, updated acceptance tests and the PR is from my POV ready. Could you please have a look @BBBmau?

Though, the fix is going to break many existing TF configs as the ineffective wait_for_rollout is set to true by default... I have wrote more in the PR description.