Closed gm12367 closed 3 years ago
I upgraded yesterday curl -sfL https://get.k3s.io | sh -
and I think I see the same issue. The logs are pretty intense: https://s.natalian.org/2019-10-28/k3s.txt
Workaround is to downgrade curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.9.1 sh -
, thanks to https://twitter.com/ibuildthecloud/status/1188640874642563072
I have the same problem with raspberrypi model 3B+ (version k3s 0.10.0) but with the 0.9.1 it's working.
Same as above ^ first time attempting to setup, v0.10.0 bugged out, downgrade to 0,9,1 worked š
Same for Raspberry Pi 3 / 3B with v0.10.1, but 0.9.1 works. Somebody please adjust the issue's title: "K3S Install on Raspberry Pi fails since v0.10.0"
Related to #869? Spotted the same error message there.
And #556 as already linked here.
Haven't really been able to find a reproducible case.
Does cat /proc/sys/kernel/random/entropy_avail
show sufficient entropy?
Would not be surprised if it is some golang arm issue, if possible might be worth trying out a 64-bit OS.
cat /proc/sys/kernel/random/entropy_avail
gives 3233 in return, but the RPi3 Architecture is not 64 Bit AFAIK. OS is Raspian
lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 9.11 (stretch)
Release: 9.11
Codename: stretch
Same issue with k3s version v0.10.1 (7d650d32) on Intel/Amd64. (Manually copied k3s v0.10.1 from releases into VirtualBox VM with Ubuntu 18.04.3) The exact error message is:
FATA[2019-10-30T16:52:21.768049354+01:00] starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout
k3s v0.9.1 starts on the same VM without this error.
@gocursor your problem could be a general networking problem, maybe not directly related to this arm issue.
@erikwilson I re-imaged OS as Ubuntu 19.10 64bit on my Raspberry Pi4, and then tried again, issue is the same as previous, "TLS handshake timeout".
Below information you probably need:
K3s version:
root@ubuntu:~# k3s -version
k3s version v0.10.1 (7d650d32)
OS version:
root@ubuntu:~# uname -a
Linux ubuntu 5.3.0-1008-raspi2 #9-Ubuntu SMP Fri Oct 18 13:26:35 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
Arch version:
root@ubuntu:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 19.10
Release: 19.10
Codename: eoan
entropy_avail:
root@ubuntu:~# cat /proc/sys/kernel/random/entropy_avail
401
Error Logs:
root@ubuntu:~# journalctl -u k3s.service
-- Logs begin at Thu 2019-04-11 16:28:37 UTC, end at Thu 2019-10-31 13:18:39 UTC. --
Oct 31 13:04:45 ubuntu systemd[1]: Starting Lightweight Kubernetes...
Oct 31 13:04:45 ubuntu k3s[1884]: time="2019-10-31T13:04:45Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/11f1b1f5f9884701e429998dc51d3b6df601985460dc405a0ad74bd87c99d1ea"
Oct 31 13:04:51 ubuntu k3s[1884]: time="2019-10-31T13:04:51.893934370Z" level=info msg="Starting k3s v0.10.1 (7d650d32)"
Oct 31 13:04:51 ubuntu k3s[1884]: time="2019-10-31T13:04:51.996804834Z" level=info msg="Kine listening on unix://kine.sock"
Oct 31 13:04:51 ubuntu k3s[1884]: time="2019-10-31T13:04:51.998663740Z" level=info msg="Fetching bootstrap data from etcd"
Oct 31 13:04:54 ubuntu k3s[1884]: time="2019-10-31T13:04:54.241480500Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --basic-auth-file=/var/lib/ranc
Oct 31 13:04:54 ubuntu k3s[1884]: Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release. It is not recommended for production environments.
Oct 31 13:04:54 ubuntu k3s[1884]: I1031 13:04:54.244334 1884 server.go:650] external host was not specified, using 192.168.199.79
Oct 31 13:04:54 ubuntu k3s[1884]: I1031 13:04:54.245661 1884 server.go:162] Version: v1.16.2-k3s.1
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.630222 1884 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolera
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.630305 1884 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeClass,R
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.632921 1884 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolera
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.632981 1884 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeClass,R
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.702617 1884 master.go:259] Using reconciler: lease
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.828329 1884 rest.go:115] the default service ipfamily for this cluster is: IPv4
Oct 31 13:05:02 ubuntu k3s[1884]: W1031 13:05:02.720128 1884 genericapiserver.go:404] Skipping API batch/v2alpha1 because it has no resources.
Oct 31 13:05:02 ubuntu k3s[1884]: W1031 13:05:02.842611 1884 genericapiserver.go:404] Skipping API node.k8s.io/v1alpha1 because it has no resources.
Oct 31 13:05:02 ubuntu k3s[1884]: W1031 13:05:02.966471 1884 genericapiserver.go:404] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
Oct 31 13:05:02 ubuntu k3s[1884]: W1031 13:05:02.989855 1884 genericapiserver.go:404] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.066864 1884 genericapiserver.go:404] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.195220 1884 genericapiserver.go:404] Skipping API apps/v1beta2 because it has no resources.
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.195351 1884 genericapiserver.go:404] Skipping API apps/v1beta1 because it has no resources.
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.259508 1884 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolera
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.259642 1884 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeClass,R
Oct 31 13:05:03 ubuntu k3s[1884]: time="2019-10-31T13:05:03.289835152Z" level=info msg="Running kube-scheduler --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --port=10251 --secure-port=0"
Oct 31 13:05:03 ubuntu k3s[1884]: time="2019-10-31T13:05:03.292533883Z" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-cert-file=/var/lib/rancher/k3s/server/tls/server
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.379460 1884 controllermanager.go:161] Version: v1.16.2-k3s.1
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.390324 1884 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.401644 1884 server.go:143] Version: v1.16.2-k3s.1
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.403409 1884 defaults.go:91] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.412646 1884 authorization.go:47] Authorization is disabled
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.412819 1884 authentication.go:79] Authentication is disabled
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.412886 1884 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
Oct 31 13:05:13 ubuntu k3s[1884]: time="2019-10-31T13:05:13.355859773Z" level=fatal msg="starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout"
Oct 31 13:05:13 ubuntu systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Oct 31 13:05:13 ubuntu systemd[1]: k3s.service: Failed with result 'exit-code'.
Oct 31 13:05:13 ubuntu systemd[1]: Failed to start Lightweight Kubernetes.
Oct 31 13:05:18 ubuntu systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Oct 31 13:05:18 ubuntu systemd[1]: k3s.service: Scheduled restart job, restart counter is at 1.
Oct 31 13:05:18 ubuntu systemd[1]: Stopped Lightweight Kubernetes.
Oct 31 13:05:18 ubuntu systemd[1]: Starting Lightweight Kubernetes...
Oct 31 13:05:22 ubuntu k3s[1921]: time="2019-10-31T13:05:22.126716301Z" level=info msg="Starting k3s v0.10.1 (7d650d32)"
Oct 31 13:05:22 ubuntu k3s[1921]: time="2019-10-31T13:05:22.136877861Z" level=info msg="Kine listening on unix://kine.sock"
Oct 31 13:05:22 ubuntu k3s[1921]: time="2019-10-31T13:05:22.138038312Z" level=info msg="Fetching bootstrap data from etcd"
Oct 31 13:05:22 ubuntu k3s[1921]: time="2019-10-31T13:05:22.271186322Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --basic-auth-file=/var/lib/ranc
Oct 31 13:05:22 ubuntu k3s[1921]: Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release. It is not recommended for production environments.
Oct 31 13:05:22 ubuntu k3s[1921]: I1031 13:05:22.273924 1921 server.go:650] external host was not specified, using 192.168.199.79
Same on a recent https://github.com/hypriot.
Also tried to install old version v0.9.1, first time failed with cgroup error:
Oct 31 14:09:12 ubuntu k3s[2377]: time="2019-10-31T14:09:12.021942176Z" level=error msg="Failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
Oct 31 14:09:12 ubuntu k3s[2377]: time="2019-10-31T14:09:12.022021433Z" level=fatal msg="failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
After add the two option of cgroup into /boot/firmware/config.txt file, and tried again, it succeed.
root@ubuntu:~# kubectl get node
NAME STATUS ROLES AGE VERSION
ubuntu Ready master 3m55s v1.15.4-k3s.1
root@ubuntu:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-5b8648d6f6-7fgm5 1/1 Running 0 3m52s
kube-system coredns-66f496764-cjg7q 1/1 Running 0 3m52s
kube-system helm-install-traefik-szt4n 0/1 Completed 0 3m52s
kube-system svclb-traefik-9b7cv 3/3 Running 0 51s
kube-system traefik-d869575c8-4gf95 1/1 Running 0 51s
After that, I tried to upgrade K3s to latest version, it succeed this time:
root@ubuntu:~# k3s -version
k3s version v0.9.1 (755bd1c6)
root@ubuntu:~# curl -sfL https://get.k3s.io | sh -
[INFO] Finding latest release
[INFO] Using v0.10.1 as release
[INFO] Downloading hash https://github.com/rancher/k3s/releases/download/v0.10.1/sha256sum-arm64.txt
[INFO] Downloading binary https://github.com/rancher/k3s/releases/download/v0.10.1/k3s-arm64
[INFO] Verifying binary download
[INFO] Installing k3s to /usr/local/bin/k3s
[INFO] Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service ā /etc/systemd/system/k3s.service.
[INFO] systemd: Starting k3s
root@ubuntu:~# kubectl get node
NAME STATUS ROLES AGE VERSION
ubuntu Ready master 7m v1.15.4-k3s.1
root@ubuntu:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system helm-install-traefik-szt4n 0/1 Completed 0 6m53s
kube-system local-path-provisioner-5b8648d6f6-7fgm5 0/1 Error 0 6m53s
kube-system coredns-66f496764-cjg7q 1/1 Running 0 6m53s
kube-system svclb-traefik-9b7cv 3/3 Running 0 3m52s
kube-system traefik-d869575c8-4gf95 0/1 Running 0 3m52s
root@ubuntu:~# kubectl get node
NAME STATUS ROLES AGE VERSION
ubuntu Ready master 7m14s v1.16.2-k3s.1
root@ubuntu:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system helm-install-traefik-szt4n 0/1 Completed 0 7m40s
kube-system local-path-provisioner-5b8648d6f6-7fgm5 1/1 Running 1 7m40s
kube-system coredns-66f496764-cjg7q 1/1 Running 1 7m40s
kube-system traefik-d869575c8-4gf95 1/1 Running 1 4m39s
kube-system svclb-traefik-vq8nb 3/3 Running 0 32s
root@ubuntu:~# uname -a
Linux ubuntu 5.3.0-1008-raspi2 #9-Ubuntu SMP Fri Oct 18 13:26:35 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
root@ubuntu:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 19.10
Release: 19.10
Codename: eoan
root@ubuntu:~# k3s -version
k3s version v0.10.1 (7d650d32)
If I have time, will tried to directly install the latest version of K3s with the two cgroup option on a fresh Ubuntu 19.10 OS. At least I can run latest K3s on my Raspberry Pi4. But as of now, still don't know if the issue relate to golang arm issue or other issue.
Thanks for testing & the data points @gm12367! Interesting, I would expect k3s v0.10.1 to error out with the same memory cgroup message as v0.9.1.
401 bytes of entropy is pretty low, would think there would be a crypto error instead of handshake timeout, but if possible please try to reproduce with the haveged
package installed.
Same on a Rock64 with Armbian:
...
I1031 20:14:08.702543 6977 controllermanager.go:161] Version: v1.16.2-k3s.1
I1031 20:14:08.707560 6977 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
I1031 20:14:08.708128 6977 server.go:143] Version: v1.16.2-k3s.1
I1031 20:14:08.708814 6977 defaults.go:91] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W1031 20:14:08.715137 6977 authorization.go:47] Authorization is disabled
W1031 20:14:08.715536 6977 authentication.go:79] Authentication is disabled
I1031 20:14:08.715755 6977 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
FATA[2019-10-31T20:14:18.691839727Z] starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout
root@gaia:~# k3s --version
k3s version v0.10.1 (7d650d32)
root@gaia:~# cat /proc/sys/kernel/random/entropy_avail
2564
root@gaia:~# uname -a
Linux gaia 4.4.192-rockchip64 #1 SMP Tue Oct 8 18:39:24 CEST 2019 aarch64 GNU/Linux
haveged
is running by default on Armbian.
After downgrading to k3s version v0.9.1 it worked.
Same problem here on openSUSE 15.1 ARM64 (RPi3)
When I do a get request to the api server on the secure port I get the following output.
renegade [~]$ curl -v https://127.0.0.1:6444
* Expire in 0 ms for 6 (transfer 0x5591476360)
* Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5591476360)
* Connected to 127.0.0.1 (127.0.0.1) port 6444 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:6444
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:6444
@erikwilson Today I found k3s version update to v0.10.2, and now it can install on Raspbian Buster successfully, don't know if the new version include the fix. I also tried with v0.10.1 and succeed as well. So I don't know if there is something changed. I checked the /proc/sys/kernel/random/entropy_avail in Raspbian, it's always above 2000, but in Ubuntu it's pretty low, sometime even lower than 100. But after add cgroup option K3s can install successfully. So maybe it not refer to crypto issue?
@erikwilson Today I found k3s version update to v0.10.2, and now it can install on Raspbian Buster successfully, don't know if the new version include the fix. I also tried with v0.10.1 and succeed as well. So I don't know if there is something changed.
Still no success with 0.10.2 on a RPi 3B+. Same TLS handshake timeout error as above. What "cgroup option" are you referring to?
@m0wlheld "cgroup_memory=1 cgroup_enable=memory", I mentioned in my previous reply, you can add it into config.txt and try again
@gm12367 Okay, I have that in my /boot/cmdline.txt (see below), still no success with any version > 0.9.1
dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=PARTUUID=a0df87db-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory
Same here. Couldn't get any v0.10.x working on rpi3b+ with raspbian up-to-date (even with cgroup_memory=1 cgroup_enable=memory
)
Running v0.10.2
on an RPi 3B+, also with cgroup_memory=1 cgroup_enable=memory
. I have the same issue with k3s exiting after the "TLS handshake timeout" message.
downgrading to k3s version 0.9.1 worked for me too.
Running on RPi 3B+ with OS:
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster
The error I got on version 0.10.2 and 0.10.0 was starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout
There's a race condition happening starting the apiserver and waiting for crds to be created. In pkg/server/context.go:41
the call to create crds is failing because of a timeout waiting for crds in pkg/server/context.go:69
. The CRDs is taking time because of the apiserver
is not yet available. If adding a simple sleep (not a suggested solution) after pkg/daemons/control/server.go:89
seems to resolve the issue.
If adding a simple sleep (not a suggested solution) after
pkg/daemons/control/server.go:89
seems to resolve the issue.
I guess a better solution would be if there's some way to make sure the function for starting the api server doesn't return until the api server is up and running properly here https://github.com/rancher/k3s/blob/master/pkg/server/server.go#L51. But I don't know if that's possible.
CRD creation is asynchronous. You have to wait until the API endpoints are ready.
The problem is that on some arm devices it takes time for the apiserver to start. If there's no apiserver available, the request for creating CRDs will timeout (TLS handshake timeout
).
The problem can easily be reproduced in a multiarch environment like Docker Desktop on OSX with qemu support:
This pulls the arm version of v0.10.2
and will fail:
$ docker run --network=host --rm -it rancher/k3s@sha256:12508dac5111fe70956855ad6ab0121452bf9caabdcf16e46d0b587ae5fa0fef server --disable-agent
... but this works fine:
$ docker run --network=host --rm -it rancher/k3s:v0.10.2 server --disable-agent
why this issue only happened in arm devices ?
why this issue only happened in arm devices ?
I don't think it does. I see the same thing happening on an amd64 VM.
I've resolved this issue with Ubuntu 19.10 on Raspberry Pi 4 by doing the following:
First, I noticed very low available entropy (below 300, sometimes as low as 35), which was causing the TLS handshake timeout error. With such a low available entropy, TLS was simply taking too long, exactly as the error states.
To resolve this I installed the rng-tools
package: sudo apt install rng-tools
. After this is installed, I enabled the hardware RNG by editing /etc/default/rng-tools
and uncommenting the line: HRNGDEVICE=/dev/hwrng
Next I ensured the /boot/firmware/nobtcmd.txt
(Ubuntu's version of /boot/firmware/cmdline.txt) contained cgroup_memory=1 cgroup_enable=memory
and rebooted. After a reboot, I made sure the cmdline options were present: cat /proc/cmdline
and that there was well over 3000 available entropy: cat /proc/sys/kernel/random/entropy_avail
k3s service runs as expected with version 10.2.
I'd expect Raspbian to exhibit the same issue and be resolved via the same method, with appropriate changes to filenames (ie, /boot/firmware/cmdline.txt instead of nobtcmd.txt, etc). Basically, enable the hardware RNG and it should also work. Someone on Raspbian should be able to verify.
@DanielWinks From what I understand Raspbian comes with rng-tools
and is setup to auto detect any hw source for entropy like /dev/hwrng
as per rng-tools defaults.
I run Raspbian on a rpi 2,3,3+,4 and I always add cgroup_cpuset=1 cgroup_memory=1 cgroup_enable="memory" swapaccount=1
to my cmdline to make sure docker and other container runtimes can run properly.
I still run into this problem though, however on the rpi4 I actually don't run into this some times and I'm guessing that it might be because of it being a faster unit and the api server is able to start properly before requests to it starts and that's why it sometimes work on that machine.
I'll try and manually enable /dev/hwrng
to see if things work then, maybe the autodetect isn't working properly.
edit: auto-detect for rng-tools seems to work fine on rpi 3, 3+ and 4 as they all have over 3000 in entropy just seconds after boot when I ssh into them.
k3s 1.10.2 work on my rpi4 currently, but fails with the tls timeout thing on the rpi3+ and rpi3.
Unfortunately I feel like arm is completely broken, here is a small change which seems to consistently sigsegv https://drone-pr.rancher.io/rancher/k3s/1820, and I have seen similar small changes in code cause completely unrelated panics in go 1.13 which is why we downgraded to go 1.12. I think there are a few possible problems:
golang is broke (probably for all of arm) network stack is broke (probably for rpi3) kernel is broke (probably only specific versions, maybe unrelated)
Unfortunately I feel like arm is completely broken, here is a small change which seems to consistently sigsegv https://drone-pr.rancher.io/rancher/k3s/1820, and I have seen similar small changes in code cause completely unrelated panics in go 1.13 which is why we downgraded to go 1.12. I think there are a few possible problems:
golang is broke (probably for all of arm) network stack is broke (probably for rpi3) kernel is broke (probably only specific versions, maybe unrelated)
but on the same OS, old k3s works well .
Yah, and the master (dc0e596) at this point in time probably works "ok", but any change is capable of breaking it.
Has arm8/64 been seeing the same problems? If not then it might be a 32bit issue specifically rather than an arm issue.
For example I'd recently encountered SIGSEGV
panics in istio code where atomics weren't being 64-bit aligned in 32-bit builds: https://github.com/istio/pkg/pull/75
Has arm8/64 been seeing the same problems? If not then it might be a 32bit issue specifically rather than an arm issue.
For example I'd recently encountered
SIGSEGV
panics in istio code where atomics weren't being 64-bit aligned in 32-bit builds: istio/pkg#75
I've had the same issue with arm64 (Rock64 debian 10)
As far as I can tell the error is not directly related to using atomics. The SIGSEGVs seem to happen mostly on arm 32-bit from the Drone logs, I have not seen it consistently happen (if at all) on arm64 kernels. If related to something like https://github.com/golang/go/issues/35207 could be a 64-bit issue also.
Someone pointed out earlier that they've seen this in an amd64 vm so I don't think this is an arm/arm64 only issue. I guess it's only presented itself there more.
The TLS handshake timeout
and SIGSEGVs may be different issues.
For 64-bit SIGSEGVs it would be good to have logs.
@zimme , that was me, using the lowest-tier Hetzner cloud VM (1 vcpu, 2 GB of RAM -- not sure if this issue is triggered more often with limited resources but I've never seen this on my laptop).
I haven't done any extensive tests though, this happend while testing my automation setup. I've re-installed the thing maybe 8 times, and 7 times I had that error, one time it worked. The one time it worked I opened port 6444 on my firewall (which is usually closed), but when I tried it again in the same scenario it failed again, so that probably has nothing to do with it.
I am not 100% sure the issue I'm seeing has the same reason, but my feeling is it's more likely than not...
Edit: I had the timeout issue
I have a small heterogeneous AMD64/ARM64 cluster where I'm experimenting with HA, and I was able to capture logs showing the flow of a successful start vs. a failed start. One of my AMD64 nodes reliably fails with K3S 0.10.2 (ironically for this thread, ARM64 Rpi3B+ node is not giving me trouble).
I'm using a variation of the docker-compose.yml
, so I can deploy the servers and nodes over macvlan.
When it succeeds, I see:
Running kube-scheduler
Running kube-controller-manager
Less than 2 seconds before
secure_serving.go:123] Serving securely
When it fails, I see:
Running kube-scheduler
Running kube-controller-manager
10.3 seconds before
starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout
I have 2756
bytes of entropy, so that's not the problem. My node in question is rather busy, though.
If I could encourage whichever component is fussing to be more tolerant, it would be great.
v0.11.0-alpha1
to work around the TLS handshake timeout
issue.Tested v0.11.0-alpha1
on my rpi3+ running raspbian and it works without a problem :+1:
edit:
Also working on rpi4 running k3os 0.6.0-rc1
with k3s manually updated to v0.11.0-alpha1
@erikwilson v0.11.0-alpha1 works in my setup with RPi3+. No more TLS handshake timeout
messages.
v0.11.0-alpha2 works on RPI3
$ k3s --version k3s version v0.11.0-alpha2 (405f85aa)
failed on RPI3.
INFO[2019-11-10T12:46:55.473870979Z] Done waiting for CRD helmcharts.helm.cattle.io to become available
FATA[2019-11-10T12:46:55.476566942Z] starting tls server: timed out waiting for the condition
$ k3s --version k3s version v0.11.0-alpha2 (405f85a)
failed on RPI3.
INFO[2019-11-10T12:46:55.473870979Z] Done waiting for CRD helmcharts.helm.cattle.io to become available FATA[2019-11-10T12:46:55.476566942Z] starting tls server: timed out waiting for the condition
@xiaods I have install rng-tool and set the swapaccount=1 as @zimme comment May it make the difference because is working
# k3s --version
k3s version v0.11.0-alpha2 (405f85aa)
# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system metrics-server-6d684c7b5-sjh44 1/1 Running 0 132m
kube-system local-path-provisioner-58fb86bdfd-f4cjr 1/1 Running 0 132m
kube-system coredns-d798c9dd-8wj8x 1/1 Running 0 132m
kube-system helm-install-traefik-pwp9g 0/1 Completed 0 132m
kube-system svclb-traefik-h7tcv 3/3 Running 0 131m
kube-system traefik-65bccdc4bd-vt9hd 1/1 Running 0 131m
cert-manager cert-manager-687f47b874-x4jk5 1/1 Running 0 124m
cert-manager cert-manager-cainjector-f44b4b959-h27xh 1/1 Running 0 124m
cert-manager cert-manager-webhook-7f8bdb755f-qqcw4 1/1 Running 1 124m
tick influxdb-deployment-c7cb599b4-txgh5 1/1 Running 0 90m
tick chronograf-deployment-7c48d8b5dc-c72jf 1/1 Running 0 84m
tick telegraf-deployment-889755bb-sgkfs 1/1 Running 0 82m
tick kapacitor-deployement-6cff699c4d-bv8jh 1/1 Running 6 86m
For what it is worth, it is recommended for kubernetes nodes to have swap disabled, but probably especially important for the RPi3 with poor i/o, as once the system starts swapping it can slow to a crawl.
Thanks for helping us to improve k3s! We welcome all bug reports. Please fill out each area of the template so we can better help you. You can delete this message portion of the bug report.
Version: Provide the output from
k3s -v
and provide the flags used to install or run k3s server.OS version:
Linux raspberrypi 4.19.75-v7l+ rancher/k3s#1270 SMP Tue Sep 24 18:51:41 BST 2019 armv7l
bootloader version:Describe the bug A clear and concise description of what the bug is. After run install command "curl -sfL https://get.k3s.io | sh -", installation can't be completed, and TLS handshake timeout error prompted
To Reproduce Steps to reproduce the behavior: Run command 'curl -sfL https://get.k3s.io | sh -' on Raspberry Pi 4b 4G memory
Expected behavior A clear and concise description of what you expected to happen.
Actual behavior A clear and concise description of what actually happened. TLS handshake timeout
Additional context Add any other context about the problem here. I put some error logs below, hope them can help: