k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.08k stars 2.35k forks source link

Problems with 1.19 and raspberries #2476

Closed anttilinno closed 4 years ago

anttilinno commented 4 years ago

Environmental Info: K3s Version:

k3s version v1.19.3+k3s2 (f8a4547b)

Node(s) CPU architecture, OS, and Version:

Linux rpi-4 5.4.72-v7l+ rancher/k3s#1356   SMP Thu Oct 22 13:57:51 BST 2020 armv7l GNU/Linux

Raspberry Pi OS, latest

Cluster Configuration:

1 master and 3 workers

Describe the bug:

root@rpi-4:~# PAGER="cat" systemctl status k3s -l
● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
   Active: activating (start) since Wed 2020-11-04 17:08:44 EET; 4s ago
     Docs: https://k3s.io
  Process: 903 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 904 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
 Main PID: 905 (k3s-server)
    Tasks: 34
   Memory: 58.7M
   CGroup: /system.slice/k3s.service
           └─905 /usr/local/bin/k3s server

Nov 04 17:08:46 rpi-4 k3s[905]: time="2020-11-04T17:08:46.091486922+02:00" level=info msg="Database tables and indexes are up to date"
Nov 04 17:08:46 rpi-4 k3s[905]: time="2020-11-04T17:08:46.103268473+02:00" level=info msg="Kine listening on unix://kine.sock"
Nov 04 17:08:46 rpi-4 k3s[905]: time="2020-11-04T17:08:46.104555219+02:00" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --enable-admission-plugins=NodeRestriction --etcd-servers=unix://kine.sock --insecure-port=0 --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --profiling=false --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=k3s --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
Nov 04 17:08:46 rpi-4 k3s[905]: I1104 17:08:46.114222     905 server.go:652] external host was not specified, using 192.168.0.230
Nov 04 17:08:46 rpi-4 k3s[905]: I1104 17:08:46.116554     905 server.go:177] Version: v1.19.3+k3s2
Nov 04 17:08:46 rpi-4 k3s[905]: I1104 17:08:46.168180     905 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Nov 04 17:08:46 rpi-4 k3s[905]: I1104 17:08:46.168399     905 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Nov 04 17:08:46 rpi-4 k3s[905]: I1104 17:08:46.181581     905 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Nov 04 17:08:46 rpi-4 k3s[905]: I1104 17:08:46.181749     905 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Nov 04 17:08:46 rpi-4 k3s[905]: I1104 17:08:46.417289     905 master.go:271] Using reconciler: lease

Steps To Reproduce:

Expected behavior:

k3s is up and running without problems

Actual behavior:

k3s fails to start up. If I reinstall k3s, then it works again until next reboot.

Additional context / logs:

Tried in VirtualBox, with debian and x86. Worked without problems. Maybe I cannot see the obvious in the logs, why is my raspberry so unhappy. Tried install, uninstall, reboot a couple time, so it is not one off chance.

brandond commented 4 years ago

@anttilinno What model Pi is this? How much memory does it have, and what are you using for storage (external USB, SD card size and class, etc)? Can you provide the full k3s logs: journalctl -u k3s --no-pager

anttilinno commented 4 years ago

Rpi4, 4GB ram. SD card CL10.

journal.log

Master is behaving rather odd at the moment. Master started up to my surprise and reports that all the workers are ready

root@rpi-4:~# kubectl get nodes
NAME     STATUS   ROLES    AGE    VERSION
rpi-3    Ready    <none>   46h    v1.19.3+k3s2
rpi-2a   Ready    <none>   46h    v1.19.3+k3s2
rpi-2b   Ready    <none>   46h    v1.19.3+k3s2
rpi-4    Ready    master   2d1h   v1.19.3+k3s2

But workers are all turned off, only the master is on.

root@rpi-4:~# ping rpi-3
PING rpi-3 (192.168.0.220) 56(84) bytes of data.
From rpi-4 (192.168.0.230) icmp_seq=1 Destination Host Unreachable
From rpi-4 (192.168.0.230) icmp_seq=2 Destination Host Unreachable
From rpi-4 (192.168.0.230) icmp_seq=3 Destination Host Unreachable
From rpi-4 (192.168.0.230) icmp_seq=4 Destination Host Unreachable
anttilinno commented 4 years ago

Kubernetes api server is behaving erratically also

root@rpi-4:~# kubectl describe node rpi-3
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
root@rpi-4:~# kubectl describe node rpi-3
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes rpi-3)
root@rpi-4:~# kubectl describe node rpi-3
Name:               rpi-3
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    k3s.io/hostname=rpi-3
                    k3s.io/internal-ip=192.168.0.220
                    kubernetes.io/arch=arm
                    kubernetes.io/hostname=rpi-3
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=k3s
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"ae:ed:66:ad:b6:23"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.0.220
                    k3s.io/node-args: ["agent"]
                    k3s.io/node-config-hash: I4WFRXIH3XK3ZIX5AKLCIQVU67JJUJHOWYNILIVWADDOURME34TA====
                    k3s.io/node-env:
                      {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/e6d79ec9a3120dea052b4f505bd82829ad5ff8f899e919f12cb2b01d8a19fa3e","K3S_TOKEN":"********","K3S_U...
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 03 Nov 2020 12:23:27 +0200
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  rpi-3
  AcquireTime:     <unset>
  RenewTime:       Tue, 03 Nov 2020 16:21:22 +0200
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 03 Nov 2020 12:23:30 +0200   Tue, 03 Nov 2020 12:23:30 +0200   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Tue, 03 Nov 2020 16:18:43 +0200   Tue, 03 Nov 2020 12:23:27 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Tue, 03 Nov 2020 16:18:43 +0200   Tue, 03 Nov 2020 12:23:27 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Tue, 03 Nov 2020 16:18:43 +0200   Tue, 03 Nov 2020 12:23:27 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Tue, 03 Nov 2020 16:18:43 +0200   Tue, 03 Nov 2020 12:23:28 +0200   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.0.220
  Hostname:    rpi-3
Capacity:
  cpu:                4
  ephemeral-storage:  15024120Ki
  memory:             947032Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  14615463925
  memory:             947032Ki
  pods:               110
System Info:
  Machine ID:                 ed6e29d1f5334526b2de5b56b065c412
  System UUID:                ed6e29d1f5334526b2de5b56b065c412
  Boot ID:                    738c7b67-a76a-40cc-b5fb-889f5cc8eb88
  Kernel Version:             5.4.72-v7+
  OS Image:                   Raspbian GNU/Linux 10 (buster)
  Operating System:           linux
  Architecture:               arm
  Container Runtime Version:  containerd://1.4.0-k3s1
  Kubelet Version:            v1.19.3+k3s2
  Kube-Proxy Version:         v1.19.3+k3s2
PodCIDR:                      10.42.2.0/24
PodCIDRs:                     10.42.2.0/24
ProviderID:                   k3s://rpi-3
Non-terminated Pods:          (2 in total)
  Namespace                   Name                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                   ------------  ----------  ---------------  -------------  ---
  kube-system                 svclb-traefik-k6qlm    0 (0%)        0 (0%)      0 (0%)           0 (0%)         47h
  metallb-system              speaker-5ntzx          100m (2%)     100m (2%)   100Mi (10%)      100Mi (10%)    46h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                100m (2%)    100m (2%)
  memory             100Mi (10%)  100Mi (10%)
  ephemeral-storage  0 (0%)       0 (0%)
Events:              <none>
brandond commented 4 years ago

The RBAC seems to be really messed up - there are a bunch of permission errors, until finally one of the core controllers throw a fatal error:

Nov 05 11:11:12 rpi-4 k3s[551]: time="2020-11-05T11:11:12.761754326+02:00" level=fatal msg="networkpolicies.networking.k8s.io is forbidden: User \"system:k3s-controller\" cannot list resource \"networkpolicies\" in API group \"networking.k8s.io\" at the cluster scope"
Nov 05 11:11:12 rpi-4 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE

When you reinstall it and it works - are you completely uninstalling everything including deleting the DB? Or are you just re-running the install script?

Can you get information from iotop -a and dstat 5 on this system when it is working?

anttilinno commented 4 years ago

For reinstall I used provided uninstall script and then used install script again. Now that you mention it, with clean install or reinstall I was experiencing same api issues from time to time, but as mostly it worked, I did not pay any attention to it.

Raspbian is clean minimal install, only ssh is activated. No other services are running but the default ones. I can try rpi 2 or 3 as a master, if it would help?

Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                 
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
    4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
    7 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u8:0-events_unbound]
    8 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]
    9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
   10 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
   11 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
   12 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/0]
   13 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/1]
   14 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
   15 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
   17 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:0H-kblockd]
   18 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/2]
   19 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
   20 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
   22 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/2:0H-kblockd]
   23 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/3]
   24 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
   25 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
   26 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:0-memcg_kmem_cache]
   28 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kdevtmpfs]
   29 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [netns]
   31 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:1-events]
   32 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kauditd]
   33 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [khungtaskd]
   34 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [oom_reaper]
   35 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [writeback]
   36 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kcompactd0]
   54 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kblockd]
   55 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [blkcg_punt_bio]
   56 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/2:1-mm_percpu_wq]
   57 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:1-events]
   58 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdogd]
   59 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rpciod]
   60 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u9:0-hci0]
   61 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [xprtiod]
   62 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kswapd0]
   63 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [nfsiod]
   64 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthrotld]
   65 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [irq/55-aerdrv]
   66 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [iscsi_eh]
   67 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u8:1-events_unbound]
   69 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [DWC Notificatio]
root@rpi-4:~# dstat 5
You did not select any stats, using -cdngy by default.
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw 
 28   2  70   0   0|1216k  268k|   0     0 |   0     0 | 757  1168 
 26   0  74   0   0|   0     0 |  92B  461B|   0     0 | 637   998 
 26   1  74   0   0|   0     0 |  79B  348B|   0     0 | 622   971 
 26   1  74   0   0|   0     0 |  79B  348B|   0     0 | 642  1012 
 26   0  74   0   0|   0     0 | 141B  348B|   0     0 | 568   947 
 26   0  73   0   0|   0    21k|  79B  354B|   0     0 | 653  1022 
 26   0  74   0   0|   0     0 |  91B  358B|   0     0 | 617   969 
 26   0  74   0   0|   0     0 |  79B  348B|   0     0 | 641  1031 
 26   1  74   0   0|   0     0 |  79B  348B|   0     0 | 613   963 
 26   0  74   0   0|   0     0 |  79B  348B|   0     0 | 636  1006 
 26   0  74   0   0|   0     0 |  79B  348B|   0     0 | 648  1026 
 26   1  74   0   0|   0     0 |  91B  356B|   0     0 | 662  1042 
 26   0  74   0   0|   0     0 |  79B  348B|   0     0 | 692  1090 
 26   0  74   0   0|   0     0 | 109B  354B|   0     0 | 660  1033 ^C
anttilinno commented 4 years ago

Ok, problem solved, more or less. It turns out a problem with power. I was using a charging station, that puts out 2.4A per port, but raspberry failed to utilize the power. I was trying to attach usb powered hdd, but it failed to power up, that was first clue. Then I tried with official power brick and everything worked. I will got to raspberry forums with my power problems. In theory Rpi cluster sounds cool, but in reality use home server with proxmox 😄