Closed medyagh closed 3 years ago
hi, @medyagh
| I0220 15:45:04.513403 82905 command_runner.go:123] > [kubelet-check] Initial timeout of 40s passed
this indicates the kubelet takes a while to become ready.
i don't think we have anything on the kubeadm side that can cause it, so it might be a core issue. i will investigate.
i can confirm the problem in our CI (kind(er) based doing containerd-in-docker in a prow pod):
this is using a 1.20.4 kubeadm with a 1.20.4 kubelet.
I0220 17:01:12.877602 169 request.go:943] Got a Retry-After 1s response for attempt 8 to https://172.17.0.7:6443/healthz?timeout=10s I0220 17:01:13.878699 169 round_trippers.go:445] GET https://172.17.0.7:6443/healthz?timeout=10s in 0 milliseconds I0220 17:01:13.878760 169 request.go:943] Got a Retry-After 1s response for attempt 9 to https://172.17.0.7:6443/healthz?timeout=10s [kubelet-check] Initial timeout of 40s passed.
this is the whole diff of k/k between 1.20.2 and 1.20.3: https://github.com/kubernetes/kubernetes/compare/v1.20.2...v1.20.3
there are kubeadm commits that change the default image repository for CI images, which doesn't seem related. i do see a few kubelet changes:
i will now try a 1.20.3 kubeadm with a 1.20.2 kubelet and bisect.
this is also a problem in the tip of 1.21 and 1.19:
[kubelet-check] Initial timeout of 40s passed.
so we must have backported something that causes this.
looking at logs and diffs this started happening on the 13th of Jan. my current suspect commit is: https://github.com/kubernetes/kubernetes/commit/88ee6d777fa343bfeec0cf5b7b0e83ef0bd8321d
@SergeyKanzhelev @ehashman please see a possible regression
before https://github.com/kubernetes/kubernetes/commit/88ee6d777fa343bfeec0cf5b7b0e83ef0bd8321d
[apiclient] All control plane components are healthy after 26.046012 seconds
after:
[apiclient] All control plane components are healthy after 69.034837 seconds
the PR that did this is: https://github.com/kubernetes/kubernetes/pull/94087
so i can see this wait making sense, but it should bail out if this is the first node and there is no API server yet.
cc @BenTheElder for kind
i have commented on the original PR.
for a self-hosting kubelet, informer sync will fail until the API server is available. Does the delay mean that it takes 40 seconds to start the self-hosted API server? or does it mean that GetNode() is called 4 times and times out in the process of starting static pods?
or does it mean that GetNode() is called 4 times and times out in the process of starting static pods?
i suspect that something like that is happening.
the base of the time comparison is a check in kubeadm where it asks both the kubelet and the apiserver to report 200 on their /healthz
.
normally the kubelet readies up within seconds and then it starts the apiserver pod.
before the change, this completes in ~26 seconds.
after this change the kubelet /healthz
takes more than 40 seconds to 200 as it waits for the apiserver for the informer sync.
So just to be clear, the apiserver still starts up just as quickly, it's just that the kubelet now does not return 200 on healthz until the apiserver is also up?
So just to be clear, the apiserver still starts up just as quickly, it's just that the kubelet now does not return 200 on healthz until the apiserver is also up?
seems that both the kubelet and api-server healthz are delayed. also yesterday i did not add sufficient logging in the kubelet to figure out whats going on.
logs from kubeadm:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0220 17:00:34.356861 169 round_trippers.go:445] GET https://172.17.0.7:6443/healthz?timeout=10s in 1 milliseconds
...
[kubelet-check] Initial timeout of 40s passed.
...
I0220 17:01:43.390484 169 round_trippers.go:445] GET https://172.17.0.7:6443/healthz?timeout=10s 200 OK in 13 milliseconds
I0220 17:01:43.390615 169 uploadconfig.go:108] [upload-config] Uploading the kubeadm ClusterConfiguration to a ConfigMap
[apiclient] All control plane components are healthy after 69.035888 seconds
[kubelet-check] Initial timeout of 40s passed.
means that the kubelet healthz is not 200 after 40 seconds.i have a WIP proposal fix for the kubelet. will try to send later.
cc @adisky
I haven't noticed any substantial delays in cluster creation on Kubernetes since this patch went in, is it possible this is minikube-specific and should be fixed on minikube bootstrap?
The patch in https://github.com/kubernetes/kubernetes/pull/94087 is a correctness fix to avoid nodes admitting pods before they have fully synced pod state. We've seen many issues with pods getting stuck in NodeAffinity status after cold-starting clusters without this patch because the kubelet starts admitting pods before it is ready and .
so i can see this wait making sense, but it should bail out if this is the first node and there is no API server yet.
I don't think the kubelet has any way of knowing it's the first node? Are you suggesting a change in k8s or kubeadm?
Edit: oops I thought I hit send on this comment 3h ago but apparently not :sweat_smile:
I haven't noticed any substantial delays in cluster creation on Kubernetes since this patch went in
what cluster creation tooling / e2e test jobs are you checking this against?
is it possible this is minikube-specific and should be fixed on minikube bootstrap?
this patch definitely creates a problem if the api-server is managed as a static pod. so it's not only kubeadm specific (or for anything that uses kubeadm such as minikube)
The patch in kubernetes/kubernetes#94087 is a correctness fix to avoid nodes admitting pods before they have fully synced pod state.
i do not disagree with the intent, i just think the implementation can be improved.
fix attempt at: https://github.com/kubernetes/kubernetes/pull/99336
What I'm confused by is why this would affect bootstrap at all. For a standalone kubelet, the checks are always bypassed: https://github.com/kubernetes/kubernetes/pull/94087/files#diff-67dd9e8f3ee257072765326cb4f242852554a2c0753563fa51e292c0a63a7b94R455-R460
what cluster creation tooling / e2e test jobs are you checking this against?
OpenShift and upstream Kubernetes. I don't have specific jobs to point you to though, idk if there's an easy way to gather stats on cluster start time from testgrid.
"self hosted" is not always standalone. kubeadm sets up a kubelet which runs the apiserver and then registers itself as a node to the API server it started
In particular for development clusters like kind/minikube etc. using kubeadm
it's common to have:
But as Jordan mentioned even kubeadm clusters with N nodes have static pod apiserver on certain nodes + registered kubelets for all nodes. This will slow down bootstrap for pretty much any kubeadm cluster since most work is blocked on healthy API server / first node.
idk if there's an easy way to gather stats on cluster start time from testgrid.
There is data of sorts, just not what you want probably. Jobs running under kubetest (GCE) record the Up
step (cluster bringup) time, but that includes more than just the core cluster bringup. IIRC kind jobs log time
for bootstrap and more in the logs, but not to junit currently (pending rewriting how we run those tests).
Local cluster jobs (local-up-cluster.sh, kind) are currently impacted by CI/prow.k8s.io I/O throttling issues impacting pod startup, build times, etc. badly in general and wouldn't make useful performance data until that's resolved.
"self hosted" is not always standalone. kubeadm sets up a kubelet which runs the apiserver and then registers itself as a node to the API server it started
Our RKE2 project does the same thing. Runs etcd plus apiserver and sundry as static pods; the kubelet registers to the local apiserver once it's up.
FYI, got pinged about this issue by some Cluster API users...
update
PR in master (1.22) merged: https://github.com/kubernetes/kubernetes/pull/99336
the change is backported all the way back to 1.18 (inc); backport PRs are pending. https://github.com/kubernetes/kubernetes/pull/99336#issuecomment-824441152
PRs merged.
(i guess the 1.21 PR has not merged yet, but is about to.)
you can expect new releases after the 12th of May.
Hello, on minikube we updated our default kubernetes to v1.20.4 and we noticed bumping from v1.20.2 to v1.20.3 causes 1 minute slow down on miikube which the logs shows it is in kubeadm init https://github.com/kubernetes/minikube/issues/10545
is there a new config or somethign that we are missing out int he v1.20.3 that is adding an additional check that causes 1 minute tax ?
screenshot of our performance dashboard