hashicorp / terraform-provider-kubernetes

Terraform Kubernetes provider
https://www.terraform.io/docs/providers/kubernetes/
Mozilla Public License 2.0
1.59k stars 974 forks source link

Deployment tolerations using keys matching k8s node taints are being ignored for the tfstate #2376

Open Restless-ET opened 10 months ago

Restless-ET commented 10 months ago

Overriding the default tolerationSeconds value for the pods of any given Deployment is totally valid necessity and should be a supported scenario by this provider IMO.

By not properly supporting this we get into this "perpetual diff" situation for a resource/object configuration that's completely acceptable within Kubernetes.

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 1.4.6
Kubernetes provider version: 2.24.0
Kubernetes version: 1.26.8

Affected Resource(s)

Steps to Reproduce

  1. Add a toleration inside spec.template.spec on your kubernetes_deployment resource. Using as key one of k8s "node.kubernetes.io/*" taints (e.g: "node.kubernetes.io/unreachable").
  2. terraform apply
  3. terraform plan (after the successful apply)

Expected Behavior

What should have happened?

The terraform plan should not show any changes to be performed after a successful apply has executed.

Actual Behavior

What actually happened?

The terraform plan keeps showing tolerations as required to be added to the deployment:

Terraform will perform the following actions:

  # module.envoy-global.kubernetes_deployment_v1.envoy will be updated in-place
  ~ resource "kubernetes_deployment_v1" "envoy" {
        id               = "egress/envoy"
        # (1 unchanged attribute hidden)

      ~ spec {
            # (5 unchanged attributes hidden)

          ~ template {
              ~ spec {
                    # (12 unchanged attributes hidden)

                  + toleration {
                      + effect             = "NoExecute"
                      + key                = "node.kubernetes.io/not-ready"
                      + operator           = "Exists"
                      + toleration_seconds = "150"
                    }
                  + toleration {
                      + effect             = "NoExecute"
                      + key                = "node.kubernetes.io/unreachable"
                      + operator           = "Exists"
                      + toleration_seconds = "90"
                    }

                    # (5 unchanged blocks hidden)
                }

                # (1 unchanged block hidden)
            }

            # (2 unchanged blocks hidden)
        }

        # (1 unchanged block hidden)
    }

Important Factoids

A similar issue was raised in the past (1) and got a fix (2), but it only addressed node.kubernetes.io prefixed toleration keys that are not part of the well-known node taints.

I believe it should be supported to properly manage any toleration key at the Deployment resource (I understand Pod may need to be left as is).

Actually, a fair point about this was already made here: https://github.com/hashicorp/terraform-provider-kubernetes/issues/955#issuecomment-681022027.

Also, (3) seems to propose a fair attempt at sorting this, but appears to have become stale and eventually (auto-)closed. I think we should recover this.

References

Community Note

arybolovlev commented 10 months ago

Hi @Restless-ET,

Thank you for raising this issue. I think it makes sense to strip well-known tolerations only when we flatten a pod object spec, in all other cases, when it is a template, we could keep them. The same approach we use for labels and annotations. You are right, it looks like #1012 introduced a similar fix but for whatever reason didn't get much attention. I will raise this proposal during our next triage session and let you know about the decision here.

Happy New Year! 🎉

Restless-ET commented 10 months ago

Hello @arybolovlev and HNY 😄

Thank you for the feedback. I can see there's already an open PR for the fix so thanks again for the quick action.