projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.89k stars 1.31k forks source link

Inconsistent terminationGracePeriodSeconds set in different versions of calico-node daemonset #8691

Open BenjaminHuang opened 5 months ago

BenjaminHuang commented 5 months ago

The calico-node daemonset has terminationGracePeriodSeconds set.

In the manifest version, it's coded as 0: terminationGracePeriodSeconds: 0

But in the version generated by tigera-operator, it's coded as 5: terminationGracePeriodSeconds: 5

However, both versions have prestop hook specified

        lifecycle:
          preStop:
            exec:
              command:
              - /bin/calico-node
              - -shutdown
  conditions:
  - lastHeartbeatTime: "2024-04-03T08:18:42Z"
    lastTransitionTime: "2024-04-03T08:18:42Z"
    message: Calico is running on this node
    reason: CalicoIsUp
    status: "False"
    type: NetworkUnavailable

and eventually cause a no-schedule taint added by kube-controller-manager:

  taints:
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    timeAdded: "2024-04-03T07:17:46Z"

However, I'm not sure which is the desired behavior.

Expected Behavior

terminationGracePeriodSeconds should be consistent in calico-node daemonset, both in manifest and tigera-operator-generated version.

Current Behavior

terminationGracePeriodSeconds is inconsistent in calico-node daemonset, between manifest and tigera-operator-generated version.

Possible Solution

set terminationGracePeriodSeconds to 0 in different version of calico-node daemonset

Steps to Reproduce (for bugs)

  1. Go to installation guide, e.g. https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises
  2. Choose the way for installing: manifest or operator in a k8s cluster (optionally using kind)
  3. Follow up the instructions to comple the installation
  4. Try deleing calico-node instance and inspect coressponding node status/taints during deletion.
  5. Compare the different behaviour , in different installation, and focus on difference of terminationGracePeriodSeconds

Context

I want calico installation from manifest or from tigera operator has the same behavior, in calico-node deletion.

Your Environment

caseydavenport commented 5 months ago

I suspect the manifest value needs to be increased to match what the operator is setting, and to enable the preStop hook to run.

cyclinder commented 5 months ago

If terminationGracePeriodSeconds set to non-zero pre-stop hook will cause NetworkUnavailable status set:

Does this look like it's expected? If so, we need to adjust the manifest value to 5.

BenjaminHuang commented 5 months ago

If terminationGracePeriodSeconds set to non-zero pre-stop hook will cause NetworkUnavailable status set:

Does this look like it's expected? If so, we need to adjust the manifest value to 5.

that depends on your situation

BenjaminHuang commented 5 months ago

I suspect the manifest value needs to be increased to match what the operator is setting, and to enable the preStop hook to run.

I'm not sure whether setting both to positive value would be better.

I guess by adding a commet above this parameter, describing the impact to node status , would be good enough.

Without that, it'd be hard to imagine what happen when changing it, you have to dig out more details from source code.

caseydavenport commented 5 months ago

if you have all pods on host network, and just want to delete calico-node from cluster, leave it as zero would be nice

Agreed, although I would classify this as an exceptional case and far from the expected scenario in 90% of Kubernetes clusters using Calico.

I think we should:

Martin-Luther commented 5 days ago

I am having the same issue. preStop is ignored when I set the terminationGracePeriodSeconds value to 0. I have increased the value to 10, and it seems to work now.

Calico version 3.28.0 Orchestrator version: kubernetes v1.30.5 + containerd://1.6.33 Operating System and version: Debian Buster