siderolabs / terraform-provider-talos

Mozilla Public License 2.0
117 stars 15 forks source link

talos_cluster_health fails, while talosctl health is fine #153

Open stelb opened 5 months ago

stelb commented 5 months ago

Hi,

first problem: │ waiting for etcd members to be control plane nodes: etcd member ips ["10.1.0.6" "XX.75.176.68" "10.1.0.2"] are not subset of control plane node ips ["10.1.0.2" "10.1.0.6" "10.1.0.7"] I added advertisedSubnets to be the internal cidr

Now etcd is ok, but now there is an unexpected k8s node │ waiting for all k8s nodes to report: can't find expected node with IPs ["10.1.0.3"] │ waiting for all k8s nodes to report: unexpected nodes with IPs ["XX.75.176.68"] (I reduced nodes)

But when I check this with talosctl:

talosctl -n 10.1.0.3 -e xx.13.164.153 health

discovered nodes: ["10.1.0.3" "xx.75.176.68"] waiting for etcd to be healthy: ... waiting for etcd to be healthy: OK waiting for etcd members to be consistent across nodes: ... waiting for etcd members to be consistent across nodes: OK waiting for etcd members to be control plane nodes: ... waiting for etcd members to be control plane nodes: OK waiting for apid to be ready: ... waiting for apid to be ready: OK waiting for all nodes memory sizes: ... waiting for all nodes memory sizes: OK waiting for all nodes disk sizes: ... waiting for all nodes disk sizes: OK waiting for kubelet to be healthy: ... waiting for kubelet to be healthy: OK waiting for all nodes to finish boot sequence: ... waiting for all nodes to finish boot sequence: OK waiting for all k8s nodes to report: ... waiting for all k8s nodes to report: OK waiting for all k8s nodes to report ready: ... waiting for all k8s nodes to report ready: OK waiting for all control plane static pods to be running: ... waiting for all control plane static pods to be running: OK waiting for all control plane components to be ready: ... waiting for all control plane components to be ready: OK waiting for kube-proxy to report ready: ... waiting for kube-proxy to report ready: SKIP waiting for coredns to report ready: ... waiting for coredns to report ready: OK waiting for all k8s nodes to report schedulable: ... waiting for all k8s nodes to report schedulable: OK

or with public cp ip:

talosctl -n xx.13.164.153 -e xx.13.164.153 health

discovered nodes: ["10.1.0.3" "xx.75.176.68"] waiting for etcd to be healthy: ... waiting for etcd to be healthy: OK waiting for etcd members to be consistent across nodes: ... waiting for etcd members to be consistent across nodes: OK waiting for etcd members to be control plane nodes: ... waiting for etcd members to be control plane nodes: OK waiting for apid to be ready: ... waiting for apid to be ready: OK waiting for all nodes memory sizes: ... waiting for all nodes memory sizes: OK waiting for all nodes disk sizes: ... waiting for all nodes disk sizes: OK waiting for kubelet to be healthy: ... waiting for kubelet to be healthy: OK waiting for all nodes to finish boot sequence: ... waiting for all nodes to finish boot sequence: OK waiting for all k8s nodes to report: ... waiting for all k8s nodes to report: OK waiting for all k8s nodes to report ready: ... waiting for all k8s nodes to report ready: OK waiting for all control plane static pods to be running: ... waiting for all control plane static pods to be running: OK waiting for all control plane components to be ready: ... waiting for all control plane components to be ready: OK waiting for kube-proxy to report ready: ... waiting for kube-proxy to report ready: SKIP waiting for coredns to report ready: ... waiting for coredns to report ready: OK waiting for all k8s nodes to report schedulable: ... waiting for all k8s nodes to report schedulable: OK

so what is the problem?