kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.76k stars 716 forks source link

apiserver fails to start because livenessprobe is too aggressive #413

Closed anguslees closed 6 years ago

anguslees commented 7 years ago

[Lubomir] NOTE: possible fix was submitted here: https://github.com/kubernetes/kubernetes/pull/66264

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+2c2fe6e8278a5", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"not a git tree", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.8", Compiler:"gc", Platform:"linux/arm"}

Environment:

What happened?

kubeadm init sits ~forever at the "waiting for control plane" stage. docker ps/logs investigation shows apiserver is being killed (SIGTERM) and restarted continuously.

What you expected to happen?

Everything to work :) In particular, apiserver to come up and the rest of the process to proceed.

How to reproduce it (as minimally and precisely as possible)?

Run kubeadm init on a slow machine.

Anything else we need to know?

For me, during the churn of all those containers starting at once, it takes apiserver about 90s(!) from its first log line to responding to HTTP queries. I haven't looked in detail at what it's doing at that point, but the logs mention what looks like etcd bootstrapping things.

My suggested fix is to set apiserver initialDelaySeconds to 180s. And probably similar elsewhere in general - I think there's very little reason to have aggressive initial delays.

(Unless you're a unittest that expects to frequently encounter failures, my experience with production software suggests the correct solution to timeouts is almost always to have waited longer).

pipejakob commented 7 years ago

It seems that we set both InitialDelaySeconds and TimeoutSeconds to 15 for the control plane pods currently, which matches what kube-up.sh does as well. I understand the initial launch being slow, with all of the images are being pulled at once, but after all of the images have been pulled, how long does it take your apiserver to respond to /healthz checks after being launched?

No doubt we should probably tweak both of these values to accommodate lower-power machines.

anguslees commented 7 years ago

Once started it can respond to health checks in << 15s - it's really just all the extra stuff that the apiserver does between exec() and actually being ready to serve afaics.

anguslees commented 7 years ago

oh, and the docker pull time doesn't count against InitialDelaySeconds afaics (good). In other examples with larger (generic ubuntu) images over my slow network link, the pull can take many minutes but the initialDelaySeconds timer doesn't seem to start ticking until the pull has completed and the docker run has started. (I haven't looked at the relevant code - just frequent anecdotal experience)

koalalorenzo commented 7 years ago

I am running the same problem. With slow machines kubeadm sits forever. Using v1.7.4

pipejakob commented 7 years ago

@anguslees and @koalalorenzo, can you confirm that if you manually change the liveness probe settings (by editing the manifest files in /etc/kubernetes/manifests/) that this fixes your problem? I also recently saw a case on Slack where the user had the same symptoms but was likely running into memory constraints, because the problem went away when they moved to a host type with more memory.

I just want to make sure that this approach will actually fix the problem before we invest time into coding it up. Thanks!

praseodym commented 7 years ago

I'm also experiencing this issue when attempting to use kubeadm in QEMU without hardware-assisted virtualisation (which is a bad idea because it is terribly slow). Increasing the InitialDelaySeconds and TimeoutSeconds helps; the cluster will then eventually come up.

anguslees commented 7 years ago

@pipejakob I can confirm that (on my bananapi) running this in another terminal at the right point in the kubeadm run makes everything come up successfully:

sed -i 's/initialDelaySeconds: [0-9]\+/initialDelaySeconds: 180/' /etc/kubernetes/manifests/kube-apiserver.yaml

(I usually also manually docker kill the old/restart-looping apiserver container, I'm not sure if that gets cleaned up automatically with static pods)

pipejakob commented 7 years ago

@anguslees Great! Thanks for confirmation.

alexjmoore commented 7 years ago

I can confirm I've also just had this issue, in my case on an raspberry pi 3. Changing to 180s fixed it, however I think I also encountered issue #106 as in my case it simply died with:

Sep 1 10:47:30 raspberrypi kubelet[6053]: W0901 10:47:30.020409 6053 kubelet.go:1596] Deleting mirror pod "kube-apiserver-raspberrypi_kube-system(7c03df63-8efa-1 1e7-ae86-b827ebdd4b52)" because it is outdated

I had to manually HUP the kubelet process to get it to kick back to life.

Tha-Fox commented 7 years ago

I can also confirm I had this and wanted to say thank you for saving my sanity. I have Raspberry Pi 2B and I was stuck at init phase for the last month. After running that one-liner once it started waiting for the control plane, I got it going forward.

anguslees commented 7 years ago

This issue still exists in kubeadm v1.8.0, and it's worse because kubeadm itself now has a 1min timeout on most actions. The 1min timeout seems to have been chosen arbitrarily, and unfortunately a) doesn't give you enough time to jump in and debug/workaround the issue (eg: sed hack above), b) enough time for the apiserver to start up (~90s for me), even when initialDelaySeconds has been extended, and c) can't be increased without hacking/rebuilding kubeadm afaics.

Timeouts break otherwise correct logic, particularly in complex eventually-consistent systems - we should never add them "just because" :( My understanding is that kubeadm is meant to be a building block on which larger deployment systems can depend. Consequently I boldly propose removing all timeouts from kubeadm itself (the various phases should continue retrying forever), and rely on the higher level process to add an overall timeout if/when appropriate in that higher level context. In the simple/direct use case this would mean "retry until user gives up and presses ^c". Would such a PR be acceptable?

luxas commented 7 years ago

@anguslees We had the "wait forever" behavior earlier; but that was very sub-optimal from an UX PoV, so now we do have timeouts. We might want to increase some of those timeouts if you want.

The problem is that usage of kubeadm is two-fold. We both have users typing kubeadm interactively that want to know if something is happening or not and higher-level consumers.

anguslees commented 6 years ago

.. So what direction are we going to go here? Currently I use a fork of kubeadm that has numerous timeouts cranked up 10x, and I'd like to believe that I can go back to using standard kubeadm binaries at some point.

0xmichalis commented 6 years ago

@anguslees We had the "wait forever" behavior earlier; but that was very sub-optimal from an UX PoV, so now we do have timeouts. We might want to increase some of those timeouts if you want.

How about making them configurable? Does it make sense to have a single option that owns all of them?

0xmichalis commented 6 years ago

/priority important-soon

luxas commented 6 years ago

Does it make sense to have a single option that owns all of them?

Probably, or some kind of "weight" for all the timeouts to be multiplied with... Otherwise we'll get into config hell with 20 different types of timeout flags :)

rsterrenburg commented 6 years ago

Running into the same issue using kubeadm upgrade on raspberry pi 2 cluster. Upgrade fails due to agressive timeouts. Changing the liveness probe settings in the manifests doesn't help. Any ideas?

anguslees commented 6 years ago

I still propose a pattern where any kubeadm timeout is inherited from the calling context (or part of a more sophisticated error recovery strategy), rather than sprinkling arbitrary timeouts throughout the lower levels of the kubeadm codebase.

In its simplest form, this would behave almost exactly like removing all the timeouts from kubeadm and replacing them with one overall "run for xx mins, then abort if not finished" global timer (since kubeadm can't do much in the way of error recovery other than just waiting longer).

For the original manifest livenessProbe timeouts, it's literally a one-liner patch. Unfortunately, fixing the livenessProbe alone is no longer sufficient since the "deviation from normal == error" fallacy has spread further throughout the kubeadm codebase. Changing cultural awareness is hard, so in the meantime I have a forked version of kubeadm here, if anyone wants to just install onto a raspberry pi. (Build with make cross WHAT=cmd/kubeadm KUBE_BUILD_PLATFORMS=linux/arm)

dvdmuckle commented 6 years ago

@anguslees Do you have a compiled 1.9.4 version of your patched kubeadm? I'm having trouble compiling your patched version.

I'm surprised kubeadm doesn't have this behavior behind a flag. Perhaps a PR is in order?

timothysc commented 6 years ago

/assign @liztio

timothysc commented 6 years ago

So we've fixed two issues in 1.11 which could affect this.

  1. Prepull the etcd image prior to startup.
  2. Fix the race-condition check on startup. ...

If this is only happening on rasberry pi - gear then we need some way of qualifying a lowest common denominator.

What is the lowest target platform that we are planning to support?

praseodym commented 6 years ago

I think Raspberry Pi 2 is the (s)lowest platform you'd want to run Kubernetes on. My tests with non-hardware assisted QEMU are too exotic to be taken into account.

Considering that the Pi suffers from slow I/O a lot, pre-pulling all images will already help a lot, but we will need some real-world tests to determine the actual timeouts.

timothysc commented 6 years ago

imo rasberry pi2 is too old to support - http://socialcompare.com/en/comparison/raspberrypi-models-comparison, came out in 2015.

>= 3 seems more reasonable imo.

anguslees commented 6 years ago

Prepulling images won't help. The livenessProbe timer doesn't start until after the image is pulled (as I pointed out above).

The fix is just to extend the initialDelaySeconds timeout(s). The current timeout values in kubeadm are being misused to "enforce" a fast UX experience, and not error detection.

Edit: And to be clear, it's only the startup that takes ages - my control cluster operates on 3xRPi2 just fine, once I increase the apiserver intialDelaySeconds timeout (and other install-only timeouts used within kubeadm itself). I don't understand why we're still talking about this :/

carlosedp commented 6 years ago

I have an ARM64 cluster on 1.9.3 and successfully updated to 1.9.7 but got the timeout problem to upgrade from 1.9.7 to 1.10.2.

I even tried editing and recompiling kubeadm increasing the timeouts (like these last commits https://github.com/anguslees/kubernetes/commits/kubeadm-gusfork) with same results.

$ sudo kubeadm upgrade apply  v1.10.2 --force
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/version] You have chosen to change the cluster version to "v1.10.2"
[upgrade/versions] Cluster version: v1.9.7
[upgrade/versions] kubeadm version: v1.10.2-dirty
[upgrade/version] Found 1 potential version compatibility errors but skipping since the --force flag is set:

   - Specified version to upgrade to "v1.10.2" is higher than the kubeadm version "v1.10.2-dirty". Upgrade kubeadm first using the tool you used to install kubeadm
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler]
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.10.2"...
Static pod: kube-apiserver-kubemaster1 hash: ed7578d5bf9314188dca798386bcfb0e
Static pod: kube-controller-manager-kubemaster1 hash: e0c3f578f1c547dcf9996e1d3390c10c
Static pod: kube-scheduler-kubemaster1 hash: 52e767858f52ac4aba448b1a113884ee
[upgrade/etcd] Upgrading to TLS for etcd
Static pod: etcd-kubemaster1 hash: 413224efa82e36533ce93e30bd18e3a8
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148/etcd.yaml"
[certificates] Using the existing etcd/ca certificate and key.
[certificates] Using the existing etcd/server certificate and key.
[certificates] Using the existing etcd/peer certificate and key.
[certificates] Using the existing etcd/healthcheck-client certificate and key.
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests190581659/etcd.yaml"
[upgrade/staticpods] Not waiting for pod-hash change for component "etcd"
[upgrade/etcd] Waiting for etcd to become available
[util/etcd] Waiting 30s for initial delay
[util/etcd] Attempting to get etcd status 1/10
[util/etcd] Attempt failed with error: dial tcp 127.0.0.1:2379: getsockopt: connection refused
[util/etcd] Waiting 15s until next retry
[util/etcd] Attempting to get etcd status 2/10
[util/etcd] Attempt failed with error: dial tcp 127.0.0.1:2379: getsockopt: connection refused
[util/etcd] Waiting 15s until next retry
[util/etcd] Attempting to get etcd status 3/10
[util/etcd] Attempt failed with error: dial tcp 127.0.0.1:2379: getsockopt: connection refused
[util/etcd] Waiting 15s until next retry
[util/etcd] Attempting to get etcd status 4/10
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148/kube-scheduler.yaml"
[upgrade/staticpods] The etcd manifest will be restored if component "kube-apiserver" fails to upgrade
[certificates] Using the existing etcd/ca certificate and key.
[certificates] Using the existing apiserver-etcd-client certificate and key.
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests190581659/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]
ghoben commented 6 years ago

In case kubeadm is used and the apiserver is started up, we can try to measure points on first startup. Maybe we alter the configuration in a later stage for the timeouts adapted on the measurements on first initialization. Also it is hard to find out, that the apiserver is kicked by the healtz check looking at the logs, we may at least get a better logging in place to be aware of the problem. It took me quite sometime to find out that the livenessprobe was the problem. I have to mention I'm a beginner, and that would at least be helpful to be mentioned somewhere on the failure output for kubeadm.

brendandburns commented 6 years ago

RaspberryPi 3 still demonstrates this issue, even with pre-pulled images. The API server takes 2-3 minutes to get to a place where it can serve a page...

It would be great to have this be configurable, because right now, I'm monkey patching the YAML file in the background while kubeadm runs.

brendandburns commented 6 years ago

@carlosedp what I do during an upgrade is take down the kubelet while the apiserver is booting. It's kind of hacky, but it prevents the health-check from firing until the apiserver is up.

But honestly kubeadm upgrade and ARM just don't work that well together in my experience...

carlosedp commented 6 years ago

@brendandburns It worked perfectly until 1.9. I deployed my 1.8 cluster with it with no hiccups, then upgraded to 1.9 two times. No idea on what might have changed in 1.10 to cause this problems.

I saw that the timeout changed in 1.11 to 5 minutes (https://github.com/kubernetes/kubernetes/pull/64988/files#diff-2056df5f0c3a4828b3f9a2510a7533c7L45). Have you tried with 1.11?

Gonna try this hack after I return from vacations. Thanks for the tip!

neolit123 commented 6 years ago

@brendandburns @carlosedp yes, please try 1.11 to confirm that the timeout increase is a fix for you.

/cc @kubernetes/sig-cluster-lifecycle-bugs

DJGummikuh commented 6 years ago

Hey! I'm also encountering this issue. Interestingly though, I manage to build a cluster master from scratch on my Raspberry 3, but consistenly fail to on my 3+. Anyways, the Version I'm currently using (as per the step-by-step documentation at https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/ ) is kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:14:41Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/arm"}

As with the others, the apiserver container does get up eventually, but not before kubeadm bails out, leaving me in limbo as I'm too unexperienced to manually pick up from there.

Quick Update: running watch -n 1.0 "sed -i 's/initialDelaySeconds: [0-9]\+/initialDelaySeconds: 180/' /etc/kubernetes/manifests/kube-apiserver.yaml" in a separate terminal allowed my cluster to get up.

neolit123 commented 6 years ago

@DJGummikuh thanks for testing 1.11.

As with the others, the apiserver container does get up eventually, but not before kubeadm bails out, leaving me in limbo as I'm too unexperienced to manually pick up from there.

how much time does it take for the apiserver to start in your case? it seems that we may have to make this configurable.

DJGummikuh commented 6 years ago

It's hard to say, I'd guess approx 1 minute, however I don't know how to properly measure that.

Additionally, now that my master is operational, I fail adding a node to it with what seems to be another timeout issue. `[preflight] running pre-flight checks [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh ip_vs] or no builtin kernel ipvs support: map[ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{}] you can solve this problem with following methods:

  1. Run 'modprobe -- ' to load missing kernel modules;
    1. Provide the missing builtin kernel ipvs support

I0708 19:02:20.256325 8667 kernel_validator.go:81] Validating kernel version I0708 19:02:20.256846 8667 kernel_validator.go:96] Validating kernel config [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.03.1-ce. Max validated version: 17.03 [discovery] Trying to connect to API Server "192.168.2.2:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.2.2:6443" [discovery] Requesting info from "https://192.168.2.2:6443" again to validate TLS against the pinned public key [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.2.2:6443" [discovery] Successfully established connection with API Server "192.168.2.2:6443" [kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [preflight] Activating the kubelet service [tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap... [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred: timed out waiting for the condition

This error is likely caused by:

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:

During this time, not a single docker container shows up on my node.

neolit123 commented 6 years ago

[WARNING RequiredIPVSKernelModulesAvailable]:

offtopic, we are talking about this here: https://github.com/kubernetes/kubeadm/issues/975

Additionally, now that my master is operational, I fail adding a node to it with what seems to be another timeout issue. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.

the kubelet could not start. better look at the kubelet logs.

- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'

the liveness probe should be made configurable, yet we should probably talk about the best way of doing that in the kubeadm config.

i think these are the values used, so if you are building kubeadm yourself try playing with: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/staticpod/utils.go#L95-L96 but mind that this would increase the timeouts on all control plane components.

DJGummikuh commented 6 years ago

Edit: I'm apparently too stupid to properly format comments in Github :-(

Here are the log outputs of both systemctl status kubelet and journalctl -xeu kubelet. "No cloud provider specified is the only thing that springs to eye.

`root@djg-clusterpi3:/home/djgummikuh# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sun 2018-07-08 19:55:15 CEST; 2min 12s ago
     Docs: http://kubernetes.io/docs/
 Main PID: 9233 (kubelet)
   Memory: 14.4M
      CPU: 17.064s
   CGroup: /system.slice/kubelet.service
           └─9233 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-pl

Jul 08 19:55:15 djg-clusterpi3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more inform
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: I0708 19:55:15.665304    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more inform
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: I0708 19:55:15.675950    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: I0708 19:55:15.676273    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:31 djg-clusterpi3 kubelet[9233]: I0708 19:55:31.963686    9233 server.go:408] Version: v1.11.0
Jul 08 19:55:31 djg-clusterpi3 kubelet[9233]: I0708 19:55:31.964029    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:31 djg-clusterpi3 kubelet[9233]: I0708 19:55:31.964378    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:31 djg-clusterpi3 kubelet[9233]: I0708 19:55:31.965040    9233 plugins.go:97] No cloud provider specified.

Jul 08 19:55:15 djg-clusterpi3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- Unit kubelet.service has finished starting up.
-- 
-- The start-up result is done.
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more inform
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: I0708 19:55:15.665304    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more inform
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: I0708 19:55:15.675950    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:15 djg-clusterpi3 kubelet[9233]: I0708 19:55:15.676273    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:31 djg-clusterpi3 kubelet[9233]: I0708 19:55:31.963686    9233 server.go:408] Version: v1.11.0
Jul 08 19:55:31 djg-clusterpi3 kubelet[9233]: I0708 19:55:31.964029    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:31 djg-clusterpi3 kubelet[9233]: I0708 19:55:31.964378    9233 feature_gate.go:230] feature gates: &{map[]}
Jul 08 19:55:31 djg-clusterpi3 kubelet[9233]: I0708 19:55:31.965040    9233 plugins.go:97] No cloud provider specified.`
neolit123 commented 6 years ago

please, mind that these logs show no errors.

DJGummikuh commented 6 years ago

Yes but that's part of the Issue I believe. I can't find a single conclusive line of [ERROR] anywhere to begin with. That's what's frustrating me so much :-)

Anyways, thanks for formatting that mess of mine :-D

So what would be a good next step?

neolit123 commented 6 years ago

So what would be a good next step?

as mentioned previously:

the liveness probe should be made configurable, yet we should probably talk about the best way of doing that in the kubeadm config.

i think these are the values used, so if you are building kubeadm yourself try playing with: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/staticpod/utils.go#L95-L96 but mind that this would increase the timeouts on all control plane components.

DJGummikuh commented 6 years ago

oof I don't build kubeadm by myself, I'm pulling it via apt from apt.kubernetes.io. I don't have anything that remotely resembles a build pipeline for go on any of my machines (never tinkered with it yet). I hoped that there would be a similar file to modify when joining a cluster like there was for creating it, however as it seems these values are hard-coded into the utils.go :-|

neolit123 commented 6 years ago

you can try this solution, but it's tricky: https://github.com/kubernetes/kubeadm/issues/413#issuecomment-402916173

the problem is that this needs a configuration change and i don't think it can be included in a 1.11.X release (but we can try). it also has to be discussed first.

DJGummikuh commented 6 years ago

That's what I already did in the comment where I told you that the master is up (that's what my watch -n 1.0 command was doing). What's happening now is that kubeadm join does not work, and as far as I can see, kubeadm join does not place yaml files for me to patch anywhere :-/

neolit123 commented 6 years ago

you could be experiencing another issue. hard to say.

anguslees commented 6 years ago

these are the values used, so if you are building kubeadm yourself try playing with: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/staticpod/utils.go#L95-L96

So I note the InitialDelaySeconds timeout here is still 15s a year later, and I don't understand why it hasn't been increased to something that actually represents reality. Is this bug report serving any purpose at all? I didn't initially propose a PR to just change this number myself because I felt someone already in the kubeadm inner circle could get that done and merged in a few minutes, but I am happy to do so if "lack of a PR" is the only reason we haven't moved forward.

I feel like everyone is trying to come up with reasons to declare the original bug report invalid (perhaps we shouldn't support rPi2, perhaps we should make the initialdelay configurable, perhaps we should pre-pull images or some other unrelated change, perhaps a fast opaque timeout failure is a better UX than a slower success), instead of just increasing the initialDelay timeout to reflect the actual initialDelay that our binaries clearly exhibit and then moving on to something else that actually deserves discussion.

</grumpy and losing patience with the process>

neolit123 commented 6 years ago

So I note the InitialDelaySeconds timeout here is still 15s a year later, and I don't understand why it hasn't been increased to something that actually represents reality. Is this bug report serving any purpose at all? I didn't initially propose a PR to just change this number myself because I felt someone already in the kubeadm inner circle could get that done and merged in a few minutes, but I am happy to do so if "lack of a PR" is the only reason we haven't moved forward.

i cannot answer your questions as the whole issue is new to me, but we can try to talk about it this week. please note the team behind kubeadm is a small team with big goals and often there isn't anyone to work on a certain task, like this one.

I feel like everyone is trying to come up with reasons to declare the original bug report invalid (perhaps we shouldn't support rPi2, perhaps we should make the initialdelay configurable, perhaps we should pre-pull images or some other unrelated change, perhaps a fast opaque timeout failure is a better UX than a slower success), instead of just increasing the initialDelay timeout to reflect the actual initialDelay that our binaries clearly exhibit and then moving on to something else that actually deserves discussion.

yes, options have been discussed already in this thread. it's a matter of picking the right option and doing the implementation.

DJGummikuh commented 6 years ago

I actually believe it would make sense to default to "no timeout" with the option of setting a timeout for the whole process (as was suggested somewhere earlier in this issue).

Reason is that most the use cases I can think of actually don't care if a specific step is executed in X Seconds or not since all that is cared about in a distributed system is eventual consistency (spooling up another node just in case is cheaper than fiddling with the settings).

As an interim solution it would however suffice to actually read the timeout settings for kubeadm join from a configuration file just like the kubeadm init stuff does so that our hack with the inflight timeout replacement works. It's a hack, don't think any different - but a terrible hack is still better than no workaround at all.

I'm personally against trying to "guess" sensible timeouts since guesses can always be wrong, would not really serve a purpose in this case (since coping strategy for elapsing timeouts is simply bailing out) and would make reproduction of errors a pain in the ass since two identical systems could start behaving different for a myriad of reasons.

neolit123 commented 6 years ago

@anguslees @DJGummikuh we had a talk about that on the recent SIG meeting.

this is a thorny problem, but here are some random notes bellow.

notes:

DJGummikuh commented 6 years ago

have a look at this: kubernetes/kubernetes#64357 the user isn't reporting a liveness probe issue at all. any idea why is that?

No it even doesn't seem really reproducible on my hardware now. Since the token for joining nodes elapsed I thought "what the heck" and kubeadm reset'd my cluster master and tried to re-init it (having that watch workaround running) - now even with that workaround I no longer am able to pull up a master on my Raspberry Pi 3+. I even increased the configured timeout from 180 to 300 seconds without any difference. So I like the idea of this being a race condition. Still I welcome your suggestion to pull request higher timeouts. Won't hurt a lot and gives the pi (which IS a slow hardware, I think we all can agree on that :-) ) a bit more breathing space.

neolit123 commented 6 years ago

related issue on ARM64 in the apiserver: https://github.com/kubernetes/kubernetes/issues/64649

DJGummikuh commented 6 years ago

Did some more research with my (still failing :-( ) Pi Cluster over the weekend.

I reinstalled one of my nodes since I suspected that maybe updating from a Pi 2 to a Pi 3+ without reinstalling the OS might cause some issues. This is not the case, the problem is the same with the brand new OS. Additionally, I took the effort to compile kubernetes with the increased timeouts and tested with that. Also, this did not really yield any result. I was able to get the master up (with my watch workaround to patch the configuration files) but joining the cluster with the second node just never succeeds.

It would really be neat to have some additional useful log output to nail this issue down. However I do not understand enough of how the kubernetes components interact to know where to add lines.

Anyone up for the challenge? ^^

neolit123 commented 6 years ago

this is really a problem outside kubeadm...

the api machinery folks did not look at my request for comments on the issue, so unless a ticket is logged for this already someone should log one in the kubernetes/kubernetes repo and tag /sig api-machinery.

It would really be neat to have some additional useful log output to nail this issue down. However I do not understand enough of how the kubernetes components interact to know where to add lines.

as a start one can enable --v 4 for the kubelet in the systemd drop-in file, which will tell the GLOG logger to enable high verbosity. one can also do the same for control plane components from the kubeadm config with this: https://kubernetes.io/docs/setup/independent/control-plane-flags/