After uninstalling calico the nodes stuck in NetworkUnavailable because of CalicoIsDown

vasiliev-a-a commented 1 year ago

Hello. I am new to Kubernetes, so i explore various CNI plugins. Tried out calico - it works perfectly. But it seems like it cannot be removed cleanly. After i remove Installation resource and tigera-operator resources all my nodes stuck with condition NetworkUnavailable with reasons CalicoIsDown. Tried to reset an re-join nodes - no result.

Expected Behavior

I believe nodes should return into "No CNI plugin" state, and after valid configuration is provided become "Ready".

Current Behavior

When calico and operator manifests are deleted with kubectl nodes stuck in NetworkUnavailable condition. Installing different CNI plugin does not fix the problem.

Steps to Reproduce (for bugs)

1. Deploying

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml

kubectl apply -f - << EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    ipPools:
    - blockSize: 28
      cidr: 10.10.0.0/22
      encapsulation: None
      natOutgoing: Disabled
      nodeSelector: all()
EOF

2. Observing success

kubectl get node -o jsonpath --template='{range .items[*]}{.metadata.name}{"\n"}{range .status.conditions[*]}{"\t"}{.type}{" : "}{.status}{" -> "}{.reason}{" ("}{.message}{")"}{"\n"}{end}{end}'

kubectl get -n kube-system deployment

3. Uninstalling

kubectl delete installation default

kubectl delete -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml

4. Observing problem

Provide a valid CNI config of your choice Recreate coredns

kubectl delete -n kube-system pod -l k8s-app=kube-dns

kubectl get -n kube-system pod -l k8s-app=kube-dns -o custom-columns=name:.metadata.name,status:.status.phase,message:.status.conditions[*].message

kubectl get node -o jsonpath --template='{range .items[*]}{.metadata.name}{"\n"}{range .status.conditions[*]}{"\t"}{.type}{" : "}{.status}{" -> "}{.reason}{" ("}{.message}{")"}{"\n"}{end}{end}'

Note

If a toleration for 'node.kubernetes.io/network-unavailable' is applied on the deployment its pods are scheduled and got IPs, thus CNI is working fine. But calico causes a node condition of NetworkUnavailable and the respective taints prevent scheduling.

Your Environment

Calico: v3.26.1 Kubernetes: v1.27.3 Debian GNU/Linux 11 (bullseye) Kernel: 5.10.0-23-amd64 Containerd: 1.6.21

caseydavenport commented 1 year ago

This is a tricky one - Calico needs to know the difference between "Calico is being restarted / upgraded" and "Calico is being removed from the cluster", and of course once Calico is removed from the cluster, there is no longer any component present in the cluster to actually remove the condition!

In general, uninstalling a CNI plugin from a cluster is a destructive operation - there will be other issues you hit after this as well, such as artifacts left on the node (iptables, routes, CNI plugin files and configuration, etc) and any existing pods in the cluster will fail to function. The cleanup will vary from plugin to plugin, so I think the normal advice is to spin up a new cluster rather than try to re-use an existing one after uninstalling networking from that cluster (or to delete, reboot, and reinstall your nodes)

That said, there may still be a way to make this particular behavior slightly nicer.

vasiliev-a-a commented 1 year ago

I went a path "weavenet -> kube-router -> calico", and i was going to switch back to the kube-router since its Service implementation worked best for me. Prior the calico it has been pretty manageable to cleanup, of course it was disruptive but doable. But with calico its like "not me? so don't give it to anyone" :) Right now i am using the kube-router, but have to apply node.kubernetes.io/network-unavailable toleration to all Pods. It is not critical since it is experimental cluster. But i think it should be at least mentioned somewhere in the docs.

and of course once Calico is removed from the cluster, there is no longer any component present in the cluster to actually remove the condition!

Is it possible to introduce some thing like calicoctl clean option to reset the conditions and taints?

caseydavenport commented 1 year ago

Is it possible to introduce some thing like calicoctl clean option to reset the conditions and taints?

This could work, yeah. It wouldn't clean up any of the programming on the nodes themselves - that would still require a node reboot at a minimum.

boedy commented 1 year ago

I'm was in the same boat. In case any else is interested, this is how I clean up the nodes on k3s:

Run on on one the control plane nodes This script will remove the NetworkUnavailable condition all nodes that have the condition. Requires jq to be installed

#!/bin/bash

# Variables
K3S_CONFIG="/etc/rancher/k3s/k3s.yaml"
APISERVER="https://127.0.0.1:6443"  # Adjust if your API server is on a different address

# Extract client certificate and key from k3s config
CLIENT_CERT_DATA=$(grep "client-certificate-data" $K3S_CONFIG | awk '{print $2}')
CLIENT_KEY_DATA=$(grep "client-key-data" $K3S_CONFIG | awk '{print $2}')

# Decode and save to temporary files
echo "$CLIENT_CERT_DATA" | base64 -d > /tmp/client.crt
echo "$CLIENT_KEY_DATA" | base64 -d > /tmp/client.key

# Get list of all nodes
NODES=$(kubectl get nodes -o=jsonpath='{.items[*].metadata.name}')

# Loop through each node
for NODE_NAME in $NODES; do
    # Get the conditions of the node in JSON format
    CONDITIONS=$(kubectl get node $NODE_NAME -o=jsonpath='{.status.conditions}')

    # Check if the node has the NodeNetworkUnavailable condition
    if echo "$CONDITIONS" | grep -q "NetworkUnavailable"; then
        # Determine the index of the NodeNetworkUnavailable condition
        CONDITION_INDEX=$(echo "$CONDITIONS" | jq '. | map(.type) | index("NetworkUnavailable")')

        echo "Patching node: $NODE_NAME at index $CONDITION_INDEX"
        curl --cacert /var/lib/rancher/k3s/server/tls/server-ca.crt \
             --cert /tmp/client.crt \
             --key /tmp/client.key \
             -H "Content-Type: application/json-patch+json" \
             -X PATCH $APISERVER/api/v1/nodes/$NODE_NAME/status \
             --data "[{ \"op\": \"remove\", \"path\": \"/status/conditions/$CONDITION_INDEX\"}]"
    else
        echo "Skipping node: $NODE_NAME as it doesn't have the NodeNetworkUnavailable condition"
    fi
done

# Cleanup temporary files
rm /tmp/client.crt /tmp/client.key

echo "Done patching nodes!"

Run on all master nodes:

ip route flush proto bird
ip link list | grep cali | awk '{print $2}' | cut -c 1-15 | xargs -I {} ip link delete {}
modprobe -r ipip
rm /etc/cni/net.d/10-calico.conflist && rm /etc/cni/net.d/calico-kubeconfig
systemctl restart k3s

Run on all worker nodes

ip route flush proto bird
ip link list | grep cali | awk '{print $2}' | cut -c 1-15 | xargs -I {} ip link delete {}
modprobe -r ipip
rm /etc/cni/net.d/10-calico.conflist && rm /etc/cni/net.d/calico-kubeconfig
systemctl restart k3s-agent

dberardo-com commented 1 year ago

i get an error with the script @boedy : curl: (58) unable to load client key: -8178 (SEC_ERROR_BAD_KEY)

any hints ?

boedy commented 1 year ago

@dberardo-com Did you run the command on one of the control plane nodes? This would only work on k3s clusters.

Check if CLIENT_CERT_DATA and CLIENT_KEY_DATA contain any data.

dberardo-com commented 1 year ago

they do ... so strange ..

also i seem to have found a not-so-nice workaround, which is basically to disable the calico operator and destroy all workloads related to calico apartfrom the "caloc-nodes" deployment. the calico nodes pods will then stuck permamently on a "pending/restart" loop because they dont find all other needed pods, but the nodes do not get tained as unschedulable ...

qpanpony commented 1 year ago

Successfully cleaned up the nodes in K8s using @boedy's script. Just replaced env variable K3S_CONFIG with env variable K8S_ADMIN_CONFIG="/etc/kubernetes/admin.conf" and changed the following curl command's --cacert parameter into "/etc/kubernetes/pki/ca.crt".

Though as said by @caseydavenport, uninstalling a CNI plugin from a cluster is a destructive operation. It's still a bit awkward that Calico can not be removed gracefully from the cluster.

dberardo-com commented 9 months ago

@boedy i can confirm that i am running the script on control plane of k3s cluster and that those vars have content : https://github.com/projectcalico/calico/issues/7816#issuecomment-1714202942

i think this is related to the private key fromat: https://superuser.com/questions/1482345/how-to-use-an-ec-private-key-with-curllibnss

this might be an issue with curl: https://github.com/k3s-io/k3s/issues/4872

do you know if it is possible to run the PATCH command using kubectl ? perhaps using: kubectl patch node $NODE_NAME --type="json" -p '[{ "op": "remove", "path": "/status/conditions/'$CONDITION_INDEX'" }]'

UPDATE i have figured out how to run the request using httpie package.

echo     '[{ "op": "remove", "path": "/status/conditions/'$CONDITION_INDEX'" }]' |  http -- verify=/var/lib/rancher/k3s/server/tls/server-ca.crt \
     --cert=/tmp/client.crt \
     --cert-key=/tmp/client.key \
     PATCH $APISERVER/api/v1/nodes/$NODE_NAME/status \
     'Content-Type:application/json-patch+json'

now i have all calico pods down and the nodes are all ready. my question is:

should i run the rest of the script above: "Run on all master nodes:" and "Run on all worker nodes:" or am i good like this?
will the node status be changed bat to unavailable upon reboot / k3s restart ?

caseydavenport commented 8 months ago

Right now i am using the kube-router, but have to apply node.kubernetes.io/network-unavailable toleration to all Pods

FWIW, I am not sure why kube-router isn't tolerating this already. Doesn't it need to tolerate unavailable network if it itself provides network to the cluster? Calico certainly does.

I would sort of expect any other implementation to tolerate this, and then to set network-unavailable to "false" once it is installed.

vasiliev-a-a commented 8 months ago

I would sort of expect any other implementation to tolerate this, and then to set network-unavailable to "false" once it is installed.

That is a good point. May be i should open an issue there too. I am also curious of how other CNI plugins, like Cilium or Weave Net, would behave 🤔

Nevertheless, i think that any solution, when it gracefully removed, should revert the system to a clean state, just like before it was installed.

caseydavenport commented 8 months ago

Nevertheless, i think that any solution, when it gracefully removed, should revert the system to a clean state, just like before it was installed.

I agree with this completely in principle. In practice, for this particular case, it's hard to do so without sacrificing the correct behavior in the very real mainline use-case of upgrading Calico on a cluster.

One thing that is a remote possibility is adding code to our helm chart to handle this. It would only work when installing / uninstalling using helm, but integration with purpose built tools for managing software give us a hook to tell if Calico is being uninstalled or if it's just being upgraded that we don't otherwise have visibility to. I still think regardless of any workarounds we can add to Calico, best practice is going to be to kubectl delete your nodes and reboot them to ensure a clean state both in the Kubernetes API and the nodes themselves if you want a fresh slate.

rshiva777 commented 3 months ago

for me, calico installation delete got stuck without being deleted. i tried to describe and check the logs but not getting any clues. Any lead would be helpful

kubectl delete installation default

caseydavenport commented 3 months ago

@rshiva777 please open a separate issue. That sounds like a different problem.

projectcalico / calico