Open Heiko-san opened 1 year ago
Actually it seems, this issue also appears in v1.24.4 and v1.24.9. It seems it just took a while, and it also doesn't appear on all clusters. We don't have a clue what is triggering this, yet. But it seems restarting kubelet remediates this for a while.
Rancher Server Setup
Information about the Cluster
User Information
Describe the bug This only happens on v1.23.6 and seems to be fixed with v1.24.4 (on the same machine os).
After enabling the "cri-dockerd" switch for Downstream clusters, kubelet's /metrics endpoint needs about 45s to return, causing the metric-server call to timeout. This leads to the metric-server not fully starting up/getting ready. And of course the related metrics aren't available.
However the other scrape endpoints (/metrics/cadvisor, /metrics/probes) seem to work just fine/fast.
To Reproduce Enable cri-dockerd in a rke1 Downstream Cluster with K8s v1.23.6 and have a look into the metric-server logs (or try calling kubelet's /metrics endoint with the service account token from metrics-server).