Open pidydx opened 1 year ago
I noticed the same problem. My current workaround is to use a manifest instead and define the wait block with
wait { rollout = true }
The problem with that approach is that you need permissions to list custom resource definitions. I hope either the necessity for CRDs in the manifest or the bug with the wait_for_rollout get fixed soon.
For reference of the manaifest CRD permissions:
Hello, thank you for opening this issue @matthi-g. Could you provide us with a tfconfig that reproduces this issue?
Hi @BBBmau, the goal of my daemonset is to pre pull images and keep them at least until the daemon set is deleted. The first (buggy) version of my implementation uses the daemon_set_v1 resource: daemon_set_v1.txt
The second version does the same thing but instead makes use of the manifest resource: manifest.txt
Creation of the daemonset_v1 finishes after 0s but observing the state of the created pods on the k8s node indicates that the pods are not up and in running state. The creation of manifest finishes after 28 s (depends on the used images and bandwidth) and this seems to line up with the state of the pods.
@BBBmau I don't have an example on hand, but as noted above daemonsets complete instantly and don't wait for the nodes to be ready. Without diving into the code to see how the check is supposed to be happening, my guess is that the Daemonset check is not validating that the number pod pods running is equal to the number of current nodes rather than checking if the cluster is simply ready to deploy the pods to any node that comes online.
@BBBmau I confirm, the issue is valid and still present. Here is a short reproducer:
terraform {
required_version = "~> 1.7"
kubernetes = {
source = "hashicorp/kubernetes"
version = "2.25.2"
}
}
}
# This should be configured individually depending in your setup
# the locals used here are omitted in the reproducer
provider "kubernetes" {
host = local.cluster_config.cluster_endpoint
cluster_ca_certificate = base64decode(local.cluster_config.cluster_ca)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
args = ["eks", "get-token", "--cluster-name", local.cluster_config.cluster_name]
command = "awscli2"
}
}
resource "kubernetes_daemon_set_v1" "i-am-not-waiting-for-rollout" {
metadata {
name = "i-am-not-waiting-for-rollout"
namespace = "default"
}
spec {
selector {
match_labels = {
name = "i-am-not-waiting-for-rollout"
}
}
template {
metadata {
labels = {
name = "i-am-not-waiting-for-rollout"
}
}
spec {
container {
name = "pause"
image = "public.ecr.aws/eks-distro/kubernetes/pause:3.9@sha256:4668ea92d0ce9b5e5d8e84a4d3875d99aea7892e136a873425095f60bc22c49a"
image_pull_policy = "IfNotPresent"
}
host_pid = true
init_container {
name = "tuner"
args = [
"sysctl fs.inotify.max_user_instances=518"
]
command = [
"/bin/chroot",
"/host",
"/bin/bash",
"-c",
"--",
]
image = "public.ecr.aws/docker/library/busybox:1.36-glibc@sha256:e046063223f7eaafbfbc026aa3954d9a31b9f1053ba5db04a4f1fdc97abd8963"
image_pull_policy = "IfNotPresent"
security_context {
capabilities {
add = [
"SYS_CHROOT",
]
drop = [
"all",
]
}
privileged = true
}
volume_mount {
mount_path = "/host"
mount_propagation = "Bidirectional"
name = "hostfs"
}
}
volume {
host_path {
path = "/"
}
name = "hostfs"
}
}
}
}
wait_for_rollout = true
}
Initial apply is instant:
Enter a value: yes
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Creating...
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Creation complete after 1s [id=default/i-am-not-waiting-for-rollout]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Any further change is applied instantly, without waiting for the DS to rollout new pods.
Enter a value: yes
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Modifying... [id=default/i-am-not-waiting-for-rollout]
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Modifications complete after 2s [id=default/i-am-not-waiting-for-rollout]
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
:warning: As others mentioned, workaround is to use kubernetes_manifest
resource, but if you define a DaemonSet and trigger any change, it also fails as there is a bug with computed fields not being respected https://github.com/hashicorp/terraform-provider-kubernetes/issues/1591 . Currently, there is no way to manage DaemonSets with wait_for_rollout = true
in this provider.
I have identified the issue and trying to prepare a fix https://github.com/hashicorp/terraform-provider-kubernetes/pull/2419 - need to test the fix yet.
I have meanwhile tested the PR, updated acceptance tests and the PR is from my POV ready. Could you please have a look @BBBmau?
Though, the fix is going to break many existing TF configs as the ineffective wait_for_rollout
is set to true
by default... I have wrote more in the PR description.
Terraform Version, Provider Version and Kubernetes Version
Affected Resource(s)
Steps to Reproduce
terraform apply
Expected Behavior
What should have happened? A kubernetes_daemonset should wait on creation/update until the pods are ready before allowing terraform to procede
Actual Behavior
What actually happened? Create/Update is instant and terraform moves on to any dependent resources without waiting.
Important Factoids
References
These issues: https://github.com/hashicorp/terraform-provider-kubernetes/issues/919 https://github.com/hashicorp/terraform-provider-kubernetes/pull/1053
And this documentation: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/daemonset#wait_for_rollout
All indicate that kubernetes_daemonset wait_for_rollout should do something.
Community Note