`talos_cluster_health`: kubelet server certificate rotation is enabled, but CSR is not approved

We are using the provider to deploy a two node k8s bare metal cluster.

We need the certificate rotation enabled for the metrics server. The Kubelet Serving Certificate Approver is being deployed using Argo CD and Argo CD is being deployed using Terraform right after the Talos cluster has been bootstrapped. Deploying the Kubelet Serving Certificate Approver via .cluster.extraManifests is not an option for us.

Without talos_cluster_health the deployment of Argo CD fails, because the k8s api is not ready. So in our case the health check is only required to ensure that the k8s api is ready for requests.

With talos_cluster_health the health check fails. On the first run it fails with missing static pods on node. On the second run it fails with kubelet server certificate rotation is enabled, but CSR is not approved.

First run

``` │ Warning: failed checks │ │ with module.talos.data.talos_cluster_health.this, │ on .terraform/modules/talos/main.tf line 118, in data "talos_cluster_health" "this": │ 118: data "talos_cluster_health" "this" { │ │ waiting for etcd to be healthy: ... │ waiting for etcd to be healthy: 1 error occurred: │ * 192.168.x.y: service is not healthy: etcd │ │ │ waiting for etcd to be healthy: OK │ waiting for etcd members to be consistent across nodes: ... │ waiting for etcd members to be consistent across nodes: OK │ waiting for etcd members to be control plane nodes: ... │ waiting for etcd members to be control plane nodes: OK │ waiting for apid to be ready: ... │ waiting for apid to be ready: OK │ waiting for all nodes memory sizes: ... │ waiting for all nodes memory sizes: OK │ waiting for all nodes disk sizes: ... │ waiting for all nodes disk sizes: OK │ waiting for no diagnostics: ... │ waiting for no diagnostics: OK │ waiting for kubelet to be healthy: ... │ waiting for kubelet to be healthy: 1 error occurred: │ * 192.168.x.y service "kubelet" not in expected state "Running": current state [Preparing] Running pre state │ │ │ waiting for kubelet to be healthy: 1 error occurred: │ * 192.168.x.y: service is not healthy: kubelet │ │ │ waiting for kubelet to be healthy: OK │ waiting for all nodes to finish boot sequence: ... │ waiting for all nodes to finish boot sequence: OK │ waiting for all k8s nodes to report: ... │ waiting for all k8s nodes to report: Get "https://192.168.x.y:6443/api/v1/nodes": dial tcp 192.168.x.y:6443: connect: connection refused │ waiting for all k8s nodes to report: can't find expected node with IPs ["192.168.x.y"] │ waiting for all k8s nodes to report: OK │ waiting for all control plane static pods to be running: ... │ waiting for all control plane static pods to be running: missing static pods on node 192.168.x.y: [kube-system/kube-apiserver kube-system/kube-controller-manager kube-system/kube-scheduler] ```

Second run

With the Kubelet Serving Certificate Approver deployed manually after the k8s api is ready, the health check succeeds and Terraform starts deploying Argo CD.

main.tf

```terraform . . . resource "talos_machine_bootstrap" "this" { depends_on = [talos_machine_configuration_apply.controlplane] client_configuration = talos_machine_secrets.this.client_configuration node = [for k, v in var.node_data.controlplanes : v.ip_address][0] } data "talos_cluster_health" "this" { depends_on = [talos_machine_bootstrap.this] client_configuration = talos_machine_secrets.this.client_configuration control_plane_nodes = [for k, v in var.node_data.controlplanes : v.ip_address] endpoints = [for k, v in var.node_data.controlplanes : v.ip_address] skip_kubernetes_checks = true } resource "talos_cluster_kubeconfig" "this" { depends_on = [data.talos_cluster_health.this] client_configuration = talos_machine_secrets.this.client_configuration node = [for k, v in var.node_data.controlplanes : v.ip_address][0] } . . . ```

Our expectation is that the health check succeeds with kubelet server certificate rotation enabled and the Kubelet Serving Certificate Approver not deployed. Something like a minimal k8s readiness check would also be sufficient in our case.

siderolabs / terraform-provider-talos

`talos_cluster_health`: kubelet server certificate rotation is enabled, but CSR is not approved #206