k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.22k stars 2.36k forks source link

K3s Install on Raspberry Pi 4b failed (TLS Handshake Timeout pi3, pi4, etc) #970

Closed gm12367 closed 3 years ago

gm12367 commented 5 years ago

Thanks for helping us to improve k3s! We welcome all bug reports. Please fill out each area of the template so we can better help you. You can delete this message portion of the bug report.

Version: Provide the output from k3s -v and provide the flags used to install or run k3s server.

root@raspberrypi:/home/pi# k3s -v
k3s version v0.10.0 (f9888ca3)

OS version: Linux raspberrypi 4.19.75-v7l+ rancher/k3s#1270 SMP Tue Sep 24 18:51:41 BST 2019 armv7l bootloader version:

root@raspberrypi:~# vcgencmd bootloader_version
Sep 10 2019 10:41:50
version f626c772b15ba1b7e0532a8d50a761b3ccbdf3bb (release)
timestamp 1568112110

Describe the bug A clear and concise description of what the bug is. After run install command "curl -sfL https://get.k3s.io | sh -", installation can't be completed, and TLS handshake timeout error prompted

To Reproduce Steps to reproduce the behavior: Run command 'curl -sfL https://get.k3s.io | sh -' on Raspberry Pi 4b 4G memory

Expected behavior A clear and concise description of what you expected to happen.

Actual behavior A clear and concise description of what actually happened. TLS handshake timeout

Additional context Add any other context about the problem here. I put some error logs below, hope them can help:

root@raspberrypi:/home/pi# journalctl -u k3s.service
-- Logs begin at Thu 2019-09-26 01:24:23 BST, end at Sun 2019-10-27 01:22:17 GMT. --
Oct 27 01:19:58 raspberrypi systemd[1]: Starting Lightweight Kubernetes...
Oct 27 01:19:58 raspberrypi k3s[3688]: time="2019-10-27T01:19:58Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/3f43b16ca97dbb7ba58868cdb2137a72ad7215762a2852ed944237bf45d44f07"
Oct 27 01:20:13 raspberrypi k3s[3688]: time="2019-10-27T01:20:13.437098936Z" level=info msg="Starting k3s v0.10.0 (f9888ca3)"
Oct 27 01:20:13 raspberrypi k3s[3688]: time="2019-10-27T01:20:13.945042885Z" level=info msg="Kine listening on unix://kine.sock"
Oct 27 01:20:13 raspberrypi k3s[3688]: time="2019-10-27T01:20:13.947965657Z" level=info msg="Fetching bootstrap data from etcd"
Oct 27 01:20:15 raspberrypi k3s[3688]: time="2019-10-27T01:20:15.186636567Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --basic-auth-file=/var/lib
Oct 27 01:20:15 raspberrypi k3s[3688]: Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release. It is not recommended for production environments.
Oct 27 01:20:15 raspberrypi k3s[3688]: I1027 01:20:15.189751    3688 server.go:650] external host was not specified, using 192.168.199.80
Oct 27 01:20:15 raspberrypi k3s[3688]: I1027 01:20:15.191063    3688 server.go:162] Version: v1.16.2-k3s.1
Oct 27 01:20:19 raspberrypi k3s[3688]: I1027 01:20:19.782703    3688 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultT
Oct 27 01:20:19 raspberrypi k3s[3688]: I1027 01:20:19.782801    3688 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeCl
Oct 27 01:20:19 raspberrypi k3s[3688]: I1027 01:20:19.785373    3688 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultT
Oct 27 01:20:19 raspberrypi k3s[3688]: I1027 01:20:19.785425    3688 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeCl
Oct 27 01:20:19 raspberrypi k3s[3688]: I1027 01:20:19.856982    3688 master.go:259] Using reconciler: lease
Oct 27 01:20:19 raspberrypi k3s[3688]: I1027 01:20:19.966350    3688 rest.go:115] the default service ipfamily for this cluster is: IPv4
Oct 27 01:20:20 raspberrypi k3s[3688]: W1027 01:20:20.788011    3688 genericapiserver.go:404] Skipping API batch/v2alpha1 because it has no resources.
Oct 27 01:20:20 raspberrypi k3s[3688]: W1027 01:20:20.853703    3688 genericapiserver.go:404] Skipping API node.k8s.io/v1alpha1 because it has no resources.
Oct 27 01:20:20 raspberrypi k3s[3688]: W1027 01:20:20.919549    3688 genericapiserver.go:404] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
Oct 27 01:20:20 raspberrypi k3s[3688]: W1027 01:20:20.931880    3688 genericapiserver.go:404] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
Oct 27 01:20:20 raspberrypi k3s[3688]: W1027 01:20:20.973747    3688 genericapiserver.go:404] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
Oct 27 01:20:21 raspberrypi k3s[3688]: W1027 01:20:21.043638    3688 genericapiserver.go:404] Skipping API apps/v1beta2 because it has no resources.
Oct 27 01:20:21 raspberrypi k3s[3688]: W1027 01:20:21.043695    3688 genericapiserver.go:404] Skipping API apps/v1beta1 because it has no resources.
Oct 27 01:20:21 raspberrypi k3s[3688]: I1027 01:20:21.078307    3688 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultT
Oct 27 01:20:21 raspberrypi k3s[3688]: I1027 01:20:21.078434    3688 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeCl
Oct 27 01:20:21 raspberrypi k3s[3688]: time="2019-10-27T01:20:21.096613858Z" level=info msg="Running kube-scheduler --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --port=10251 --secure-port=0"
Oct 27 01:20:21 raspberrypi k3s[3688]: time="2019-10-27T01:20:21.098945424Z" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-cert-file=/var/lib/rancher/k3s/server/tls/s
Oct 27 01:20:21 raspberrypi k3s[3688]: I1027 01:20:21.119387    3688 controllermanager.go:161] Version: v1.16.2-k3s.1
Oct 27 01:20:21 raspberrypi k3s[3688]: I1027 01:20:21.121660    3688 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
Oct 27 01:20:21 raspberrypi k3s[3688]: I1027 01:20:21.127479    3688 server.go:143] Version: v1.16.2-k3s.1
Oct 27 01:20:21 raspberrypi k3s[3688]: I1027 01:20:21.127709    3688 defaults.go:91] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
Oct 27 01:20:21 raspberrypi k3s[3688]: W1027 01:20:21.139439    3688 authorization.go:47] Authorization is disabled
Oct 27 01:20:21 raspberrypi k3s[3688]: W1027 01:20:21.139494    3688 authentication.go:79] Authentication is disabled
Oct 27 01:20:21 raspberrypi k3s[3688]: I1027 01:20:21.139527    3688 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
Oct 27 01:20:31 raspberrypi k3s[3688]: time="2019-10-27T01:20:31.111017958Z" level=fatal msg="starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout"
Oct 27 01:20:31 raspberrypi systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Oct 27 01:20:31 raspberrypi systemd[1]: k3s.service: Failed with result 'exit-code'.
Oct 27 01:20:31 raspberrypi systemd[1]: Failed to start Lightweight Kubernetes.
Oct 27 01:20:36 raspberrypi systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Oct 27 01:20:36 raspberrypi systemd[1]: k3s.service: Scheduled restart job, restart counter is at 1.
Oct 27 01:20:36 raspberrypi systemd[1]: Stopped Lightweight Kubernetes.
Oct 27 01:20:36 raspberrypi systemd[1]: Starting Lightweight Kubernetes...
kaihendry commented 5 years ago

I upgraded yesterday curl -sfL https://get.k3s.io | sh - and I think I see the same issue. The logs are pretty intense: https://s.natalian.org/2019-10-28/k3s.txt

kaihendry commented 5 years ago

Workaround is to downgrade curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.9.1 sh -, thanks to https://twitter.com/ibuildthecloud/status/1188640874642563072

mordredz commented 5 years ago

I have the same problem with raspberrypi model 3B+ (version k3s 0.10.0) but with the 0.9.1 it's working.

dan-mcm commented 5 years ago

Same as above ^ first time attempting to setup, v0.10.0 bugged out, downgrade to 0,9,1 worked šŸ‘

m0wlheld commented 5 years ago

Same for Raspberry Pi 3 / 3B with v0.10.1, but 0.9.1 works. Somebody please adjust the issue's title: "K3S Install on Raspberry Pi fails since v0.10.0"

m0wlheld commented 5 years ago

Related to #869? Spotted the same error message there.

erikwilson commented 5 years ago

And #556 as already linked here.

Haven't really been able to find a reproducible case. Does cat /proc/sys/kernel/random/entropy_avail show sufficient entropy? Would not be surprised if it is some golang arm issue, if possible might be worth trying out a 64-bit OS.

m0wlheld commented 5 years ago

cat /proc/sys/kernel/random/entropy_avail gives 3233 in return, but the RPi3 Architecture is not 64 Bit AFAIK. OS is Raspian

lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 9.11 (stretch)
Release:        9.11
Codename:       stretch
erikwilson commented 5 years ago

https://www.raspberrypi.org/forums/viewtopic.php?t=231618

gocursor commented 5 years ago

Same issue with k3s version v0.10.1 (7d650d32) on Intel/Amd64. (Manually copied k3s v0.10.1 from releases into VirtualBox VM with Ubuntu 18.04.3) The exact error message is:

FATA[2019-10-30T16:52:21.768049354+01:00] starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout

k3s v0.9.1 starts on the same VM without this error.

cjellick commented 5 years ago

@gocursor your problem could be a general networking problem, maybe not directly related to this arm issue.

gm12367 commented 5 years ago

@erikwilson I re-imaged OS as Ubuntu 19.10 64bit on my Raspberry Pi4, and then tried again, issue is the same as previous, "TLS handshake timeout".

Below information you probably need:

K3s version:

root@ubuntu:~# k3s -version
k3s version v0.10.1 (7d650d32)

OS version:

root@ubuntu:~# uname -a
Linux ubuntu 5.3.0-1008-raspi2 #9-Ubuntu SMP Fri Oct 18 13:26:35 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux

Arch version:

root@ubuntu:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 19.10
Release:        19.10
Codename:       eoan

entropy_avail:

root@ubuntu:~# cat /proc/sys/kernel/random/entropy_avail
401

Error Logs:

root@ubuntu:~# journalctl -u k3s.service
-- Logs begin at Thu 2019-04-11 16:28:37 UTC, end at Thu 2019-10-31 13:18:39 UTC. --
Oct 31 13:04:45 ubuntu systemd[1]: Starting Lightweight Kubernetes...
Oct 31 13:04:45 ubuntu k3s[1884]: time="2019-10-31T13:04:45Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/11f1b1f5f9884701e429998dc51d3b6df601985460dc405a0ad74bd87c99d1ea"
Oct 31 13:04:51 ubuntu k3s[1884]: time="2019-10-31T13:04:51.893934370Z" level=info msg="Starting k3s v0.10.1 (7d650d32)"
Oct 31 13:04:51 ubuntu k3s[1884]: time="2019-10-31T13:04:51.996804834Z" level=info msg="Kine listening on unix://kine.sock"
Oct 31 13:04:51 ubuntu k3s[1884]: time="2019-10-31T13:04:51.998663740Z" level=info msg="Fetching bootstrap data from etcd"
Oct 31 13:04:54 ubuntu k3s[1884]: time="2019-10-31T13:04:54.241480500Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --basic-auth-file=/var/lib/ranc
Oct 31 13:04:54 ubuntu k3s[1884]: Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release. It is not recommended for production environments.
Oct 31 13:04:54 ubuntu k3s[1884]: I1031 13:04:54.244334    1884 server.go:650] external host was not specified, using 192.168.199.79
Oct 31 13:04:54 ubuntu k3s[1884]: I1031 13:04:54.245661    1884 server.go:162] Version: v1.16.2-k3s.1
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.630222    1884 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolera
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.630305    1884 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeClass,R
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.632921    1884 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolera
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.632981    1884 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeClass,R
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.702617    1884 master.go:259] Using reconciler: lease
Oct 31 13:05:00 ubuntu k3s[1884]: I1031 13:05:00.828329    1884 rest.go:115] the default service ipfamily for this cluster is: IPv4
Oct 31 13:05:02 ubuntu k3s[1884]: W1031 13:05:02.720128    1884 genericapiserver.go:404] Skipping API batch/v2alpha1 because it has no resources.
Oct 31 13:05:02 ubuntu k3s[1884]: W1031 13:05:02.842611    1884 genericapiserver.go:404] Skipping API node.k8s.io/v1alpha1 because it has no resources.
Oct 31 13:05:02 ubuntu k3s[1884]: W1031 13:05:02.966471    1884 genericapiserver.go:404] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
Oct 31 13:05:02 ubuntu k3s[1884]: W1031 13:05:02.989855    1884 genericapiserver.go:404] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.066864    1884 genericapiserver.go:404] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.195220    1884 genericapiserver.go:404] Skipping API apps/v1beta2 because it has no resources.
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.195351    1884 genericapiserver.go:404] Skipping API apps/v1beta1 because it has no resources.
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.259508    1884 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolera
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.259642    1884 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeClass,R
Oct 31 13:05:03 ubuntu k3s[1884]: time="2019-10-31T13:05:03.289835152Z" level=info msg="Running kube-scheduler --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --port=10251 --secure-port=0"
Oct 31 13:05:03 ubuntu k3s[1884]: time="2019-10-31T13:05:03.292533883Z" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-cert-file=/var/lib/rancher/k3s/server/tls/server
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.379460    1884 controllermanager.go:161] Version: v1.16.2-k3s.1
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.390324    1884 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.401644    1884 server.go:143] Version: v1.16.2-k3s.1
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.403409    1884 defaults.go:91] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.412646    1884 authorization.go:47] Authorization is disabled
Oct 31 13:05:03 ubuntu k3s[1884]: W1031 13:05:03.412819    1884 authentication.go:79] Authentication is disabled
Oct 31 13:05:03 ubuntu k3s[1884]: I1031 13:05:03.412886    1884 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
Oct 31 13:05:13 ubuntu k3s[1884]: time="2019-10-31T13:05:13.355859773Z" level=fatal msg="starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout"
Oct 31 13:05:13 ubuntu systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Oct 31 13:05:13 ubuntu systemd[1]: k3s.service: Failed with result 'exit-code'.
Oct 31 13:05:13 ubuntu systemd[1]: Failed to start Lightweight Kubernetes.
Oct 31 13:05:18 ubuntu systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Oct 31 13:05:18 ubuntu systemd[1]: k3s.service: Scheduled restart job, restart counter is at 1.
Oct 31 13:05:18 ubuntu systemd[1]: Stopped Lightweight Kubernetes.
Oct 31 13:05:18 ubuntu systemd[1]: Starting Lightweight Kubernetes...
Oct 31 13:05:22 ubuntu k3s[1921]: time="2019-10-31T13:05:22.126716301Z" level=info msg="Starting k3s v0.10.1 (7d650d32)"
Oct 31 13:05:22 ubuntu k3s[1921]: time="2019-10-31T13:05:22.136877861Z" level=info msg="Kine listening on unix://kine.sock"
Oct 31 13:05:22 ubuntu k3s[1921]: time="2019-10-31T13:05:22.138038312Z" level=info msg="Fetching bootstrap data from etcd"
Oct 31 13:05:22 ubuntu k3s[1921]: time="2019-10-31T13:05:22.271186322Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --basic-auth-file=/var/lib/ranc
Oct 31 13:05:22 ubuntu k3s[1921]: Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release. It is not recommended for production environments.
Oct 31 13:05:22 ubuntu k3s[1921]: I1031 13:05:22.273924    1921 server.go:650] external host was not specified, using 192.168.199.79
sttts commented 5 years ago

Same on a recent https://github.com/hypriot.

gm12367 commented 5 years ago

Also tried to install old version v0.9.1, first time failed with cgroup error:

Oct 31 14:09:12 ubuntu k3s[2377]: time="2019-10-31T14:09:12.021942176Z" level=error msg="Failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
Oct 31 14:09:12 ubuntu k3s[2377]: time="2019-10-31T14:09:12.022021433Z" level=fatal msg="failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"

After add the two option of cgroup into /boot/firmware/config.txt file, and tried again, it succeed.

root@ubuntu:~# kubectl get node
NAME     STATUS   ROLES    AGE     VERSION
ubuntu   Ready    master   3m55s   v1.15.4-k3s.1
root@ubuntu:~# kubectl get pod -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-5b8648d6f6-7fgm5   1/1     Running     0          3m52s
kube-system   coredns-66f496764-cjg7q                   1/1     Running     0          3m52s
kube-system   helm-install-traefik-szt4n                0/1     Completed   0          3m52s
kube-system   svclb-traefik-9b7cv                       3/3     Running     0          51s
kube-system   traefik-d869575c8-4gf95                   1/1     Running     0          51s

After that, I tried to upgrade K3s to latest version, it succeed this time:

root@ubuntu:~# k3s -version
k3s version v0.9.1 (755bd1c6)
root@ubuntu:~# curl -sfL https://get.k3s.io | sh -
[INFO]  Finding latest release
[INFO]  Using v0.10.1 as release
[INFO]  Downloading hash https://github.com/rancher/k3s/releases/download/v0.10.1/sha256sum-arm64.txt
[INFO]  Downloading binary https://github.com/rancher/k3s/releases/download/v0.10.1/k3s-arm64
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO]  Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO]  Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service ā†’ /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
root@ubuntu:~# kubectl get node
NAME     STATUS   ROLES    AGE   VERSION
ubuntu   Ready    master   7m    v1.15.4-k3s.1
root@ubuntu:~# kubectl get pod -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   helm-install-traefik-szt4n                0/1     Completed   0          6m53s
kube-system   local-path-provisioner-5b8648d6f6-7fgm5   0/1     Error       0          6m53s
kube-system   coredns-66f496764-cjg7q                   1/1     Running     0          6m53s
kube-system   svclb-traefik-9b7cv                       3/3     Running     0          3m52s
kube-system   traefik-d869575c8-4gf95                   0/1     Running     0          3m52s
root@ubuntu:~# kubectl get node
NAME     STATUS   ROLES    AGE     VERSION
ubuntu   Ready    master   7m14s   v1.16.2-k3s.1

root@ubuntu:~# kubectl get pod -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   helm-install-traefik-szt4n                0/1     Completed   0          7m40s
kube-system   local-path-provisioner-5b8648d6f6-7fgm5   1/1     Running     1          7m40s
kube-system   coredns-66f496764-cjg7q                   1/1     Running     1          7m40s
kube-system   traefik-d869575c8-4gf95                   1/1     Running     1          4m39s
kube-system   svclb-traefik-vq8nb                       3/3     Running     0          32s
root@ubuntu:~# uname -a
Linux ubuntu 5.3.0-1008-raspi2 #9-Ubuntu SMP Fri Oct 18 13:26:35 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
root@ubuntu:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 19.10
Release:        19.10
Codename:       eoan
root@ubuntu:~# k3s -version
k3s version v0.10.1 (7d650d32)

If I have time, will tried to directly install the latest version of K3s with the two cgroup option on a fresh Ubuntu 19.10 OS. At least I can run latest K3s on my Raspberry Pi4. But as of now, still don't know if the issue relate to golang arm issue or other issue.

erikwilson commented 5 years ago

Thanks for testing & the data points @gm12367! Interesting, I would expect k3s v0.10.1 to error out with the same memory cgroup message as v0.9.1.

401 bytes of entropy is pretty low, would think there would be a crypto error instead of handshake timeout, but if possible please try to reproduce with the haveged package installed.

f2hex commented 5 years ago

Same on a Rock64 with Armbian:

...
I1031 20:14:08.702543    6977 controllermanager.go:161] Version: v1.16.2-k3s.1
I1031 20:14:08.707560    6977 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
I1031 20:14:08.708128    6977 server.go:143] Version: v1.16.2-k3s.1
I1031 20:14:08.708814    6977 defaults.go:91] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W1031 20:14:08.715137    6977 authorization.go:47] Authorization is disabled
W1031 20:14:08.715536    6977 authentication.go:79] Authentication is disabled
I1031 20:14:08.715755    6977 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
FATA[2019-10-31T20:14:18.691839727Z] starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout
root@gaia:~# k3s --version
k3s version v0.10.1 (7d650d32)
root@gaia:~# cat /proc/sys/kernel/random/entropy_avail
2564

root@gaia:~# uname -a
Linux gaia 4.4.192-rockchip64 #1 SMP Tue Oct 8 18:39:24 CEST 2019 aarch64 GNU/Linux

haveged is running by default on Armbian.

After downgrading to k3s version v0.9.1 it worked.

drbugfinder commented 5 years ago

Same problem here on openSUSE 15.1 ARM64 (RPi3)

zimme commented 5 years ago

When I do a get request to the api server on the secure port I get the following output.

renegade [~]$ curl -v https://127.0.0.1:6444
* Expire in 0 ms for 6 (transfer 0x5591476360)
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5591476360)
* Connected to 127.0.0.1 (127.0.0.1) port 6444 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:6444
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:6444
gm12367 commented 5 years ago

@erikwilson Today I found k3s version update to v0.10.2, and now it can install on Raspbian Buster successfully, don't know if the new version include the fix. I also tried with v0.10.1 and succeed as well. So I don't know if there is something changed. I checked the /proc/sys/kernel/random/entropy_avail in Raspbian, it's always above 2000, but in Ubuntu it's pretty low, sometime even lower than 100. But after add cgroup option K3s can install successfully. So maybe it not refer to crypto issue?

m0wlheld commented 5 years ago

@erikwilson Today I found k3s version update to v0.10.2, and now it can install on Raspbian Buster successfully, don't know if the new version include the fix. I also tried with v0.10.1 and succeed as well. So I don't know if there is something changed.

Still no success with 0.10.2 on a RPi 3B+. Same TLS handshake timeout error as above. What "cgroup option" are you referring to?

gm12367 commented 5 years ago

@m0wlheld "cgroup_memory=1 cgroup_enable=memory", I mentioned in my previous reply, you can add it into config.txt and try again

m0wlheld commented 5 years ago

@gm12367 Okay, I have that in my /boot/cmdline.txt (see below), still no success with any version > 0.9.1

dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=PARTUUID=a0df87db-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

mcanevet commented 5 years ago

Same here. Couldn't get any v0.10.x working on rpi3b+ with raspbian up-to-date (even with cgroup_memory=1 cgroup_enable=memory)

squishykid commented 5 years ago

Running v0.10.2 on an RPi 3B+, also with cgroup_memory=1 cgroup_enable=memory. I have the same issue with k3s exiting after the "TLS handshake timeout" message.

gvanderberg commented 5 years ago

downgrading to k3s version 0.9.1 worked for me too.

Running on RPi 3B+ with OS:

Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 10 (buster)
Release:        10
Codename:       buster

The error I got on version 0.10.2 and 0.10.0 was starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout

larmog commented 5 years ago

There's a race condition happening starting the apiserver and waiting for crds to be created. In pkg/server/context.go:41 the call to create crds is failing because of a timeout waiting for crds in pkg/server/context.go:69. The CRDs is taking time because of the apiserver is not yet available. If adding a simple sleep (not a suggested solution) after pkg/daemons/control/server.go:89 seems to resolve the issue.

zimme commented 5 years ago

If adding a simple sleep (not a suggested solution) after pkg/daemons/control/server.go:89 seems to resolve the issue.

I guess a better solution would be if there's some way to make sure the function for starting the api server doesn't return until the api server is up and running properly here https://github.com/rancher/k3s/blob/master/pkg/server/server.go#L51. But I don't know if that's possible.

sttts commented 5 years ago

CRD creation is asynchronous. You have to wait until the API endpoints are ready.

larmog commented 5 years ago

The problem is that on some arm devices it takes time for the apiserver to start. If there's no apiserver available, the request for creating CRDs will timeout (TLS handshake timeout). The problem can easily be reproduced in a multiarch environment like Docker Desktop on OSX with qemu support:

This pulls the arm version of v0.10.2 and will fail:

$ docker run --network=host --rm -it rancher/k3s@sha256:12508dac5111fe70956855ad6ab0121452bf9caabdcf16e46d0b587ae5fa0fef server --disable-agent

... but this works fine:

$ docker run --network=host --rm -it rancher/k3s:v0.10.2 server --disable-agent
thomasdba commented 5 years ago

why this issue only happened in arm devices ?

makkus commented 5 years ago

why this issue only happened in arm devices ?

I don't think it does. I see the same thing happening on an amd64 VM.

DanielWinks commented 5 years ago

I've resolved this issue with Ubuntu 19.10 on Raspberry Pi 4 by doing the following:

First, I noticed very low available entropy (below 300, sometimes as low as 35), which was causing the TLS handshake timeout error. With such a low available entropy, TLS was simply taking too long, exactly as the error states.

To resolve this I installed the rng-tools package: sudo apt install rng-tools. After this is installed, I enabled the hardware RNG by editing /etc/default/rng-tools and uncommenting the line: HRNGDEVICE=/dev/hwrng

Next I ensured the /boot/firmware/nobtcmd.txt (Ubuntu's version of /boot/firmware/cmdline.txt) contained cgroup_memory=1 cgroup_enable=memory and rebooted. After a reboot, I made sure the cmdline options were present: cat /proc/cmdline and that there was well over 3000 available entropy: cat /proc/sys/kernel/random/entropy_avail

k3s service runs as expected with version 10.2.

I'd expect Raspbian to exhibit the same issue and be resolved via the same method, with appropriate changes to filenames (ie, /boot/firmware/cmdline.txt instead of nobtcmd.txt, etc). Basically, enable the hardware RNG and it should also work. Someone on Raspbian should be able to verify.

zimme commented 5 years ago

@DanielWinks From what I understand Raspbian comes with rng-tools and is setup to auto detect any hw source for entropy like /dev/hwrng as per rng-tools defaults.

I run Raspbian on a rpi 2,3,3+,4 and I always add cgroup_cpuset=1 cgroup_memory=1 cgroup_enable="memory" swapaccount=1 to my cmdline to make sure docker and other container runtimes can run properly.

I still run into this problem though, however on the rpi4 I actually don't run into this some times and I'm guessing that it might be because of it being a faster unit and the api server is able to start properly before requests to it starts and that's why it sometimes work on that machine.

I'll try and manually enable /dev/hwrng to see if things work then, maybe the autodetect isn't working properly.

edit: auto-detect for rng-tools seems to work fine on rpi 3, 3+ and 4 as they all have over 3000 in entropy just seconds after boot when I ssh into them.

k3s 1.10.2 work on my rpi4 currently, but fails with the tls timeout thing on the rpi3+ and rpi3.

erikwilson commented 5 years ago

Unfortunately I feel like arm is completely broken, here is a small change which seems to consistently sigsegv https://drone-pr.rancher.io/rancher/k3s/1820, and I have seen similar small changes in code cause completely unrelated panics in go 1.13 which is why we downgraded to go 1.12. I think there are a few possible problems:

golang is broke (probably for all of arm) network stack is broke (probably for rpi3) kernel is broke (probably only specific versions, maybe unrelated)

thomasdba commented 5 years ago

Unfortunately I feel like arm is completely broken, here is a small change which seems to consistently sigsegv https://drone-pr.rancher.io/rancher/k3s/1820, and I have seen similar small changes in code cause completely unrelated panics in go 1.13 which is why we downgraded to go 1.12. I think there are a few possible problems:

golang is broke (probably for all of arm) network stack is broke (probably for rpi3) kernel is broke (probably only specific versions, maybe unrelated)

but on the same OS, old k3s works well .

erikwilson commented 5 years ago

Yah, and the master (dc0e596) at this point in time probably works "ok", but any change is capable of breaking it.

nickbp commented 5 years ago

Has arm8/64 been seeing the same problems? If not then it might be a 32bit issue specifically rather than an arm issue.

For example I'd recently encountered SIGSEGV panics in istio code where atomics weren't being 64-bit aligned in 32-bit builds: https://github.com/istio/pkg/pull/75

ahmedmagdiosman commented 5 years ago

Has arm8/64 been seeing the same problems? If not then it might be a 32bit issue specifically rather than an arm issue.

For example I'd recently encountered SIGSEGV panics in istio code where atomics weren't being 64-bit aligned in 32-bit builds: istio/pkg#75

I've had the same issue with arm64 (Rock64 debian 10)

erikwilson commented 5 years ago

As far as I can tell the error is not directly related to using atomics. The SIGSEGVs seem to happen mostly on arm 32-bit from the Drone logs, I have not seen it consistently happen (if at all) on arm64 kernels. If related to something like https://github.com/golang/go/issues/35207 could be a 64-bit issue also.

zimme commented 5 years ago

Someone pointed out earlier that they've seen this in an amd64 vm so I don't think this is an arm/arm64 only issue. I guess it's only presented itself there more.

erikwilson commented 5 years ago

The TLS handshake timeout and SIGSEGVs may be different issues. For 64-bit SIGSEGVs it would be good to have logs.

makkus commented 5 years ago

@zimme , that was me, using the lowest-tier Hetzner cloud VM (1 vcpu, 2 GB of RAM -- not sure if this issue is triggered more often with limited resources but I've never seen this on my laptop).

I haven't done any extensive tests though, this happend while testing my automation setup. I've re-installed the thing maybe 8 times, and 7 times I had that error, one time it worked. The one time it worked I opened port 6444 on my firewall (which is usually closed), but when I tried it again in the same scenario it failed again, so that probably has nothing to do with it.

I am not 100% sure the issue I'm seeing has the same reason, but my feeling is it's more likely than not...

Edit: I had the timeout issue

jbalonso commented 5 years ago

I have a small heterogeneous AMD64/ARM64 cluster where I'm experimenting with HA, and I was able to capture logs showing the flow of a successful start vs. a failed start. One of my AMD64 nodes reliably fails with K3S 0.10.2 (ironically for this thread, ARM64 Rpi3B+ node is not giving me trouble).

I'm using a variation of the docker-compose.yml, so I can deploy the servers and nodes over macvlan.

When it succeeds, I see:

Running kube-scheduler Running kube-controller-manager

Less than 2 seconds before

secure_serving.go:123] Serving securely

When it fails, I see:

Running kube-scheduler Running kube-controller-manager

10.3 seconds before

starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout

I have 2756 bytes of entropy, so that's not the problem. My node in question is rather busy, though.

If I could encourage whichever component is fussing to be more tolerant, it would be great.

erikwilson commented 5 years ago

1007 is available in v0.11.0-alpha1 to work around the TLS handshake timeout issue.

zimme commented 5 years ago

Tested v0.11.0-alpha1 on my rpi3+ running raspbian and it works without a problem :+1:

edit:

Also working on rpi4 running k3os 0.6.0-rc1 with k3s manually updated to v0.11.0-alpha1

wabuMike commented 5 years ago

@erikwilson v0.11.0-alpha1 works in my setup with RPi3+. No more TLS handshake timeout messages.

pierremahot commented 5 years ago

v0.11.0-alpha2 works on RPI3

xiaods commented 5 years ago

$ k3s --version k3s version v0.11.0-alpha2 (405f85aa)

failed on RPI3.

INFO[2019-11-10T12:46:55.473870979Z] Done waiting for CRD helmcharts.helm.cattle.io to become available 
FATA[2019-11-10T12:46:55.476566942Z] starting tls server: timed out waiting for the condition 
pierremahot commented 5 years ago

$ k3s --version k3s version v0.11.0-alpha2 (405f85a)

failed on RPI3.

INFO[2019-11-10T12:46:55.473870979Z] Done waiting for CRD helmcharts.helm.cattle.io to become available 
FATA[2019-11-10T12:46:55.476566942Z] starting tls server: timed out waiting for the condition 

@xiaods I have install rng-tool and set the swapaccount=1 as @zimme comment May it make the difference because is working

# k3s --version
k3s version v0.11.0-alpha2 (405f85aa)
# kubectl get pod -A
NAMESPACE      NAME                                      READY   STATUS      RESTARTS   AGE
kube-system    metrics-server-6d684c7b5-sjh44            1/1     Running     0          132m
kube-system    local-path-provisioner-58fb86bdfd-f4cjr   1/1     Running     0          132m
kube-system    coredns-d798c9dd-8wj8x                    1/1     Running     0          132m
kube-system    helm-install-traefik-pwp9g                0/1     Completed   0          132m
kube-system    svclb-traefik-h7tcv                       3/3     Running     0          131m
kube-system    traefik-65bccdc4bd-vt9hd                  1/1     Running     0          131m
cert-manager   cert-manager-687f47b874-x4jk5             1/1     Running     0          124m
cert-manager   cert-manager-cainjector-f44b4b959-h27xh   1/1     Running     0          124m
cert-manager   cert-manager-webhook-7f8bdb755f-qqcw4     1/1     Running     1          124m
tick           influxdb-deployment-c7cb599b4-txgh5       1/1     Running     0          90m
tick           chronograf-deployment-7c48d8b5dc-c72jf    1/1     Running     0          84m
tick           telegraf-deployment-889755bb-sgkfs        1/1     Running     0          82m
tick           kapacitor-deployement-6cff699c4d-bv8jh    1/1     Running     6          86m
erikwilson commented 5 years ago

For what it is worth, it is recommended for kubernetes nodes to have swap disabled, but probably especially important for the RPi3 with poor i/o, as once the system starts swapping it can slow to a crawl.