siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.39k stars 514 forks source link

`talosctl health` extra flags #7967

Open mircea-pavel-anton opened 9 months ago

mircea-pavel-anton commented 9 months ago

Feature Request

Ability to specify whether or not to wait for nodes to be ready.

Description

When deloying Talos, I saw that a lot of people are disabling the CNI and opting to manually install one later on, mainly when doing gitops.

Currently, talosctl health checks on the health of the cluster end-to-end, i.e. both Talos and Kubernetes. I think there should be a flag, something like talosctl health --kubernetes=false which would validate the health up to and including the kubelet, so without checking if the nodes are in a Ready state, since without a CNI they will never reach that state.

This makes it a bit harder to automate installs like bootstrap -> wait -> apply CNI for example

mircea-pavel-anton commented 9 months ago

For some context, I am currently using a bash script to wait until the kubelet becomes healthy on my nodes:

while true; do
    output=$(talosctl dmesg -n $NODE_IP 2>&1)

    if echo "$output" | grep -Fq "service[kubelet](Running): Health check successful"; then
        echo ""
        echo "Kubelet is Healthy on node $NODE_IP!"
        break
    else
        printf "."
        sleep 1
    fi
done

But I feel like there should be a more elegant way to handle this, since it's not an uncommon scenario to disable the CNI

mrclrchtr commented 5 months ago

That would be amazing. I have exactly the same problem with CNI. Especially in terraform, talos_cluster_health runs infinitely when no CNI is installed. This also makes a reapply on abort impossible because it wants to read first before you can apply.