alborotogarcia commented 3 years ago

Environmental Info: K3s Version: v1.21.2+k3s1

Node(s) CPU architecture, OS, and Version:

Master node (Debian 10):

Linux kpi4 5.10.46-custom-v8+ #9 SMP PREEMPT Mon Jul 5 02:10:34 CEST 2021 aarch64 GNU/Linux

Worker nodes (Ubuntu Bionic 18.04):

Linux knode1 4.9.201-tegra-virt-tegra #187 SMP PREEMPT Mon Apr 26 14:49:26 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
Linux knode2 4.9.201-tegra-virt-tegra #187 SMP PREEMPT Mon Apr 26 14:49:26 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
```
# pi@kpi4:~ $  k3s check-config
```

Verifying binaries in /var/lib/rancher/k3s/data/67f8817e7c5b5a7e397a9c75dafbfe2db3696b18c79ba3c51f746e1f960c8f14/bin:

sha256sum: good
links: good

System:

/usr/sbin iptables v1.8.2 (legacy): ok
swap: disabled
routes: default CIDRs 10.42.0.0/16 or 10.43.0.0/16 already routed

Limits:

/proc/sys/kernel/keys/root_maxkeys: 1000000

info: reading kernel config from /proc/config.gz ...

Generally Necessary:

cgroup hierarchy: properly mounted [/sys/fs/cgroup]
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
CONFIG_KEYS: enabled
CONFIG_VETH: enabled
CONFIG_BRIDGE: enabled (as module)
CONFIG_BRIDGE_NETFILTER: enabled (as module)
CONFIG_IP_NF_FILTER: enabled
CONFIG_IP_NF_TARGET_MASQUERADE: enabled
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
CONFIG_IP_NF_NAT: enabled
CONFIG_NF_NAT: enabled
CONFIG_POSIX_MQUEUE: enabled

Optional Features:

CONFIG_USER_NS: enabled
CONFIG_SECCOMP: enabled
CONFIG_CGROUP_PIDS: enabled
CONFIG_BLK_CGROUP: enabled
CONFIG_BLK_DEV_THROTTLING: enabled
CONFIG_CGROUP_PERF: enabled
CONFIG_CGROUP_HUGETLB: enabled
CONFIG_NET_CLS_CGROUP: enabled (as module)
CONFIG_CGROUP_NET_PRIO: enabled
CONFIG_CFS_BANDWIDTH: enabled
CONFIG_FAIR_GROUP_SCHED: enabled
CONFIG_RT_GROUP_SCHED: enabled
CONFIG_IP_NF_TARGET_REDIRECT: enabled
CONFIG_IP_SET: enabled (as module)
CONFIG_IP_VS: enabled (as module)
CONFIG_IP_VS_NFCT: enabled
CONFIG_IP_VS_PROTO_TCP: enabled
CONFIG_IP_VS_PROTO_UDP: enabled
CONFIG_IP_VS_RR: enabled (as module)
CONFIG_EXT4_FS: enabled
CONFIG_EXT4_FS_POSIX_ACL: enabled
CONFIG_EXT4_FS_SECURITY: enabled
Network Drivers:
- "overlay":
- CONFIG_VXLAN: enabled (as module) Optional (for encrypted networks):
  - CONFIG_CRYPTO: enabled
  - CONFIG_CRYPTO_AEAD: enabled
  - CONFIG_CRYPTO_GCM: enabled
  - CONFIG_CRYPTO_SEQIV: enabled
  - CONFIG_CRYPTO_GHASH: enabled
  - CONFIG_XFRM: enabled
  - CONFIG_XFRM_USER: enabled
  - CONFIG_XFRM_ALGO: enabled
  - CONFIG_INET_ESP: enabled
  - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
Storage Drivers:
- "overlay":
- CONFIG_OVERLAY_FS: enabled

STATUS: pass

cgroup v2

pi@kpi4:~ $ ls /sys/fs/cgroup/memory cgroup.clone_children kubepods memory.kmem.max_usage_in_bytes memory.kmem.tcp.usage_in_bytes memory.memsw.limit_in_bytes memory.pressure_level memory.use_hierarchy user.slice cgroup.event_control memory.failcnt memory.kmem.slabinfo memory.kmem.usage_in_bytes memory.memsw.max_usage_in_bytes memory.soft_limit_in_bytes notify_on_release cgroup.procs memory.force_empty memory.kmem.tcp.failcnt memory.limit_in_bytes memory.memsw.usage_in_bytes memory.stat release_agent cgroup.sane_behavior memory.kmem.failcnt memory.kmem.tcp.limit_in_bytes memory.max_usage_in_bytes memory.move_charge_at_immigrate memory.swappiness system.slice init.scope memory.kmem.limit_in_bytes memory.kmem.tcp.max_usage_in_bytes memory.memsw.failcnt memory.oom_control memory.usage_in_bytes tasks


<!-- Provide some basic information on the cluster configuration. For example, "3 servers, 2 agents". --> 1 server, 2 agents
**Describe the bug:**
<!-- A clear and concise description of what the bug is. -->
Containerd-runc overlay mount error probes in master node making it unresponsive after a few hours.
**Steps To Reproduce:**
<!-- Steps to reproduce the behavior. Please include as the first step how you installed K3s on the node(s) (including all flags or environment variables). If you have customized configuration via systemd drop-ins or overrides (https://coreos.com/os/docs/latest/using-systemd-drop-in-units.html) please include those as well. -->
- Installed K3s:

/etc/systemd/system/k3s.service

[Unit] Description=Lightweight Kubernetes Documentation=https://k3s.io After=network-online.target

[Service] Type=notify ExecStartPre=-/sbin/modprobe br_netfilter ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/local/bin/k3s server --disable servicelb --tls-san 192.168.1.226,example.com,192.168.1.227,cluster.local --kubelet-arg=max-pods=1022 --kube-controller-manager-arg=node-cidr-mask-size=22 --flannel-backend=host-gw KillMode=process Delegate=yes

Having non-zero Limit*s causes performance problems due to accounting overhead

in the kernel. We recommend using cgroups to do container-local accounting.

LimitNOFILE=1048576 LimitNPROC=infinity LimitCORE=infinity TasksMax=infinity TimeoutStartSec=0 Restart=always RestartSec=5s

[Install] WantedBy=multi-user.target

/etc/systemd/system/k3s-node.service

[Unit] Description=Lightweight Kubernetes Documentation=https://k3s.io After=network-online.target

[Service] Type=notify ExecStartPre=-/sbin/modprobe br_netfilter ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/local/bin/k3s agent --server https://kpi4:6443 --token K10d77ab80ea82087dfcdbe21d8a6b781c9a5c176db43e89d31596dc78763168a91::server:6892c7aa82636ad69e592a03f0d1d4c5 KillMode=process Delegate=yes

Having non-zero Limit*s causes performance problems due to accounting overhead

in the kernel. We recommend using cgroups to do container-local accounting.

LimitNOFILE=1048576 LimitNPROC=infinity LimitCORE=infinity TasksMax=infinity TimeoutStartSec=0 Restart=always RestartSec=5s

[Install] WantedBy=multi-user.target


**Expected behavior:**
<!-- A clear and concise description of what you expected to happen. -->
No error logs on journal
**Actual behavior:**
<!-- A clear and concise description of what actually happened. -->
IMHO I think it may be related to containerd and overlay rather than k3s, this weird behaviour only appears to happen on raspberry pi, no matter if I elect it as server or agent node, nor trying on 64bit OS that it features such as Ubuntu 21.04, ubuntu server and Raspberry PI OS. 

k3s events

LAST SEEN TYPE 29m 29m 29m 29m 29m 29m 29m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 19m REASON OBJECT MESSAGE Normal Starting node/kpi4 Starting kubelet. Warning InvalidDiskCapacity node/kpi4 invalid capacity 0 on image filesystem Normal NodeHasSufficientMemory node/kpi4 Node kpi4 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure node/kpi4 Node kpi4 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID node/kpi4 Node kpi4 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced node/kpi4 Updated Node Allocatable limit across pods Normal Starting node/kpi4 Starting kube-proxy. Normal NodeReady node/kpi4 Node kpi4 status is now: NodeReady Normal Synced node/kpi4 Node synced successfully Normal RegisteredNode node/kpi4 Node kpi4 event: Registered Node kpi4 in Controller Normal Starting node/knode1 Starting kubelet. Warning InvalidDiskCapacity node/knode1 invalid capacity 0 on image filesystem Normal Starting node/knode2 Starting kubelet. Warning InvalidDiskCapacity node/knode2 invalid capacity 0 on image filesystem Normal NodeHasSufficientMemory node/knode1 Node knode1 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure node/knode1 Node knode1 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID node/knode1 Node knode1 status is now: NodeHasSufficientPID Normal NodeHasSufficientMemory node/knode2 Node knode2 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure node/knode2 Node knode2 status is now: NodeHasNoDiskPressure Normal Synced node/knode1 Node synced successfully Normal NodeHasSufficientPID node/knode2 Node knode2 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced node/knode1 Updated Node Allocatable limit across pods Normal Starting node/knode1 Starting kube-proxy. Normal Synced node/knode2 Node synced successfully Normal NodeAllocatableEnforced node/knode2 Updated Node Allocatable limit across pods Normal Starting node/knode2 Starting kube-proxy. Normal RegisteredNode node/knode1 Node knode1 event: Registered Node knode1 in Controller Normal RegisteredNode node/knode2 Node knode2 event: Registered Node knode2 in Controller Normal NodeReady node/knode1 Node knode1 status is now: NodeReady Normal NodeReady node/knode2 Node knode2 status is now: NodeReady Warning FailedToCreateEndpoint endpoints/longhorn-backend Failed to create endpoint for service longhorn-system/longhorn-backend: endpoints "longhorn-backend" already exists

● run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771-rootfs.mount - /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771/rootfs Loaded: loaded (/proc/self/mountinfo) Active: active (mounted) since Mon 2021-07-05 20:40:34 CEST; 4min 44s ago Where: /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771/rootfs What: overlay

Where=/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771/rootfs What=overlay Options=rw,relatime,lowerdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/64/fs:/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/63/fs:/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/62/fs:/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/61/fs:/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/60/fs:/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/59/fs:/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/58/fs:/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/52/fs:/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/37/fs,upperdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/65/fs,workdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/65/work,index=off Type=overlay TimeoutUSec=1min 30s ControlPID=0 DirectoryMode=0755 SloppyOptions=no LazyUnmount=no ForceUnmount=no Result=success UID=[not set] GID=[not set] Slice=system.slice MemoryCurrent=[not set] CPUUsageNSec=[not set] TasksCurrent=[not set] IPIngressBytes=18446744073709551615 IPIngressPackets=18446744073709551615 IPEgressBytes=18446744073709551615 IPEgressPackets=18446744073709551615 Delegate=no CPUAccounting=no CPUWeight=[not set] StartupCPUWeight=[not set] CPUShares=[not set] StartupCPUShares=[not set] CPUQuotaPerSecUSec=infinity IOAccounting=no IOWeight=[not set] StartupIOWeight=[not set] BlockIOAccounting=no BlockIOWeight=[not set] StartupBlockIOWeight=[not set] MemoryAccounting=yes MemoryMin=0 MemoryLow=0 MemoryHigh=infinity MemoryMax=infinity MemorySwapMax=infinity MemoryLimit=infinity DevicePolicy=auto TasksAccounting=yes TasksMax=4915 IPAccounting=no UMask=0022 LimitCPU=infinity LimitCPUSoft=infinity LimitFSIZE=infinity LimitFSIZESoft=infinity LimitDATA=infinity LimitDATASoft=infinity LimitSTACK=infinity LimitSTACKSoft=8388608 LimitCORE=infinity LimitCORESoft=0 LimitRSS=infinity LimitRSSSoft=infinity LimitNOFILE=524288 LimitNOFILESoft=1024 LimitAS=infinity LimitASSoft=infinity LimitNPROC=30204 LimitNPROCSoft=30204 LimitMEMLOCK=65536 LimitMEMLOCKSoft=65536 LimitLOCKS=infinity LimitLOCKSSoft=infinity LimitSIGPENDING=30204 LimitSIGPENDINGSoft=30204 LimitMSGQUEUE=819200 LimitMSGQUEUESoft=819200 LimitNICE=0 LimitNICESoft=0 LimitRTPRIO=0 LimitRTPRIOSoft=0 LimitRTTIME=infinity LimitRTTIMESoft=infinity OOMScoreAdjust=0 Nice=0 IOSchedulingClass=0 IOSchedulingPriority=0 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardInputData= StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SyslogLevel=6 SyslogFacility=3 LogLevelMax=-1 LogRateLimitIntervalUSec=0 LogRateLimitBurst=0 SecureBits=0 CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read cap_perfmon cap_bpf AmbientCapabilities= DynamicUser=no RemoveIPC=no MountFlags= PrivateTmp=no PrivateDevices=no ProtectKernelTunables=no ProtectKernelModules=no ProtectControlGroups=no PrivateNetwork=no PrivateUsers=no PrivateMounts=no ProtectHome=no ProtectSystem=no SameProcessGroup=yes UtmpMode=init IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=0 LockPersonality=no RuntimeDirectoryPreserve=no RuntimeDirectoryMode=0755 StateDirectoryMode=0755 CacheDirectoryMode=0755 LogsDirectoryMode=0755 ConfigurationDirectoryMode=0755 MemoryDenyWriteExecute=no RestrictRealtime=no RestrictNamespaces=no MountAPIVFS=no KeyringMode=shared KillMode=control-group KillSignal=15 FinalKillSignal=9 SendSIGKILL=yes SendSIGHUP=no WatchdogSignal=6 Id=run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771-rootfs.mount Names=run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771-rootfs.mount Requires=system.slice -.mount Conflicts=umount.target Before=umount.target local-fs.target After=system.slice -.mount systemd-journald.socket local-fs-pre.target RequiresMountsFor=/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771 Description=/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771/rootfs LoadState=loaded ActiveState=active SubState=mounted SourcePath=/proc/self/mountinfo StateChangeTimestamp=Mon 2021-07-05 20:40:34 CEST StateChangeTimestampMonotonic=4149692709 InactiveExitTimestamp=Mon 2021-07-05 20:40:34 CEST InactiveExitTimestampMonotonic=4149692709 ActiveEnterTimestamp=Mon 2021-07-05 20:40:34 CEST ActiveEnterTimestampMonotonic=4149692709 ActiveExitTimestampMonotonic=0 InactiveEnterTimestampMonotonic=0 CanStart=yes CanStop=yes CanReload=yes CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureJobMode=replace IgnoreOnIsolate=yes NeedDaemonReload=no JobTimeoutUSec=infinity JobRunningTimeoutUSec=infinity JobTimeoutAction=none ConditionResult=no AssertResult=no ConditionTimestampMonotonic=0 AssertTimestampMonotonic=0 Transient=no Perpetual=no StartLimitIntervalUSec=10s StartLimitBurst=5 StartLimitAction=none FailureAction=none FailureActionExitStatus=-1 SuccessAction=none SuccessActionExitStatus=-1 InvocationID=33c6a692294f4930a8a262de75da7ae0 CollectMode=inactive


**Additional context / logs:**
<!-- Add any other context and/or logs about the problem here. -->

Containerd Snapshotters

root@kpi4:/home/pi# k3s ctr plugins list | grep snapshotter io.containerd.snapshotter.v1 overlayfs linux/arm64/v8 ok
io.containerd.snapshotter.v1 native linux/arm64/v8 ok
io.containerd.snapshotter.v1 fuse-overlayfs linux/arm64/v8 ok

First journal entry occurred

Jul 05 20:28:55 kpi4 kernel: IPVS: [wrr] scheduler registered. Jul 05 20:28:55 kpi4 kernel: IPVS: [sh] scheduler registered. Jul 05 20:28:55 kpi4 k3s[12002]: E0705 20:28:55.529790 12002 node.go:161] Failed to retrieve node info: nodes "kpi4" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API group "" at the cluster scope Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.596000 12002 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt Jul 05 20:28:55 kpi4 k3s[12002]: W0705 20:28:55.708937 12002 sysinfo.go:203] Nodes topology is not available, providing CPU topology Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.722155 12002 server.go:660] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /" Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.724476 12002 container_manager_linux.go:291] "Container manager verified user specified cgroup-root exists" cgroupRoot=[] Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.725754 12002 container_manager_linux.go:296] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:true CgroupRo Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.727209 12002 topology_manager.go:120] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container" Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.728431 12002 container_manager_linux.go:327] "Initializing Topology Manager" policy="none" scope="container" Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.729568 12002 container_manager_linux.go:332] "Creating device plugin manager" devicePluginEnabled=true Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.732465 12002 kubelet.go:404] "Attempting to sync node with API server" Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.734277 12002 kubelet.go:272] "Adding static pod path" path="/var/lib/rancher/k3s/agent/pod-manifests" Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.736942 12002 kubelet.go:283] "Adding apiserver pod source" Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.738428 12002 apiserver.go:42] "Waiting for node sync before watching apiserver pods" Jul 05 20:28:55 kpi4 k3s[12002]: I0705 20:28:55.742594 12002 kuberuntime_manager.go:222] "Container runtime initialized" containerRuntime="containerd" version="v1.4.4-k3s2" apiVersion="v1alpha2" ...skipping... Jul 05 20:40:34 kpi4 systemd[1]: run-containerd-runc-k8s.io-6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771-runc.cDmIjD.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- The unit run-containerd-runc-k8s.io-6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771-runc.cDmIjD.mount has successfully entered the 'dead' state. Jul 05 20:40:34 kpi4 systemd[652]: run-containerd-runc-k8s.io-6ebbd28459368680767dd0a519affa8f9dc21aa1b73cc88c6f3d80140f950771-runc.cDmIjD.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- The unit UNIT has successfully entered the 'dead' state. Jul 05 20:40:35 kpi4 systemd[652]: tmp-ctd\x2dvolume667618187.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- The unit UNIT has successfully entered the 'dead' state. Jul 05 20:40:35 kpi4 systemd[1]: tmp-ctd\x2dvolume667618187.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- The unit tmp-ctd\x2dvolume667618187.mount has successfully entered the 'dead' state. Jul 05 20:40:35 kpi4 dhcpcd[489]: vethe5a04940: probing for an IPv4LL address Jul 05 20:40:40 kpi4 dhcpcd[489]: vethe5a04940: using IPv4LL address 169.254.9.128 Jul 05 20:40:40 kpi4 dhcpcd[489]: vethe5a04940: adding route to 169.254.0.0/16 Jul 05 20:40:40 kpi4 avahi-daemon[482]: Joining mDNS multicast group on interface vethe5a04940.IPv4 with address 169.254.9.128. Jul 05 20:40:40 kpi4 avahi-daemon[482]: New relevant interface vethe5a04940.IPv4 for mDNS.

brandond commented 3 years ago

I'm not sure if this is related, but avahi appears to be attempting to manage all your container interfaces by joining them to multicast groups. I've seen this cause problems in the past (example: https://github.com/k3s-io/k3s/issues/2599#issuecomment-828659931) - can you try disabling or uninstalling avahi from your host?

alborotogarcia commented 3 years ago

Still getting the same error disabling both avahi-daemon service and socket..

pi@kpi4:~ $  systemctl status avahi-daemon.service
● avahi-daemon.service - Avahi mDNS/DNS-SD Stack
   Loaded: loaded (/lib/systemd/system/avahi-daemon.service; disabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2021-07-06 22:56:55 CEST; 48min ago
 Main PID: 475 (code=exited, status=0/SUCCESS)
   Status: "avahi-daemon 0.7 starting up."

Jul 05 21:25:21 kpi4 avahi-daemon[475]: New relevant interface eth0.IPv4 for mDNS.
Jul 05 21:25:21 kpi4 avahi-daemon[475]: Registering new address record for 192.168.1.223 on eth0.IPv4.
Jul 06 22:56:55 kpi4 avahi-daemon[475]: Got SIGTERM, quitting.
Jul 06 22:56:55 kpi4 avahi-daemon[475]: Leaving mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1.
Jul 06 22:56:55 kpi4 systemd[1]: Stopping Avahi mDNS/DNS-SD Stack...
Jul 06 22:56:55 kpi4 avahi-daemon[475]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::eae7:f8c5:2fc:d0df.
Jul 06 22:56:55 kpi4 avahi-daemon[475]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.223.
Jul 06 22:56:55 kpi4 avahi-daemon[475]: avahi-daemon 0.7 exiting.
Jul 06 22:56:55 kpi4 systemd[1]: avahi-daemon.service: Succeeded.
Jul 06 22:56:55 kpi4 systemd[1]: Stopped Avahi mDNS/DNS-SD Stack.
pi@kpi4:~ $   systemctl status avahi-daemon.socket
● avahi-daemon.socket - Avahi mDNS/DNS-SD Stack Activation Socket
   Loaded: loaded (/lib/systemd/system/avahi-daemon.socket; disabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2021-07-06 22:57:23 CEST; 47min ago
   Listen: /run/avahi-daemon/socket (Stream)

Jul 05 21:25:09 kpi4 systemd[1]: Listening on Avahi mDNS/DNS-SD Stack Activation Socket.
Jul 06 22:57:23 kpi4 systemd[1]: avahi-daemon.socket: Succeeded.
Jul 06 22:57:23 kpi4 systemd[1]: Closed Avahi mDNS/DNS-SD Stack Activation Socket.

pi@kpi4:~ $  journalctl -xb0 -f -u k3s
(*v1.MountPropagationMode)(nil), SubPathExpr:""}, v1.VolumeMount{Name:"host", ReadOnly:false, MountPath:"/rootfs", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(0x400d34fbf0), SubPathExpr:""}, v1.VolumeMount{Name:"lib-modules", ReadOnly:true, MountPath:"/lib/modules", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil), SubPathExpr:""}}, VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(0x400d34fc00), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*v1.SecurityContext)(0x4014f49ce0), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0x4011fb2bd8), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"longhorn-service-account", DeprecatedServiceAccount:"longhorn-service-account", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:true, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(0x4008c58700), ImagePullSecrets:[]v1.LocalObjectReference{v1.LocalObjectReference{Name:"regcred"}}, Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil)}}, UpdateStrategy:v1.DaemonSetUpdateStrategy{Type:"RollingUpdate", RollingUpdate:(*v1.RollingUpdateDaemonSet)(0x400d34fc40)}, MinReadySeconds:0, RevisionHistoryLimit:(*int32)(0x4011fb2bf8)}, Status:v1.DaemonSetStatus{CurrentNumberScheduled:0, NumberMisscheduled:0, DesiredNumberScheduled:0, NumberReady:0, ObservedGeneration:0, UpdatedNumberScheduled:0, NumberAvailable:0, NumberUnavailable:0, CollisionCount:(*int32)(nil), Conditions:[]v1.DaemonSetCondition(nil)}}: Operation cannot be fulfilled on daemonsets.apps "longhorn-csi-plugin": the object has been modified; please apply your changes to the latest version and try again
Jul 06 23:05:15 kpi4 k3s[19003]: I0706 23:05:15.550283   19003 trace.go:205] Trace[670189661]: "GuaranteedUpdate etcd3" type:*apps.ReplicaSet (06-Jul-2021 23:05:14.597) (total time: 953ms):
Jul 06 23:05:15 kpi4 k3s[19003]: Trace[670189661]: ---"Transaction committed" 934ms (23:05:00.550)
Jul 06 23:05:15 kpi4 k3s[19003]: Trace[670189661]: [953.075615ms] [953.075615ms] END
Jul 06 23:05:15 kpi4 k3s[19003]: I0706 23:05:15.550727   19003 trace.go:205] Trace[1448889774]: "Update" url:/apis/apps/v1/namespaces/longhorn-system/replicasets/csi-resizer-6859cb947f/status,user-agent:k3s/v1.21.2+k3s1 (linux/arm64) kubernetes/5a67e8d/system:serviceaccount:kube-system:replicaset-controller,client:127.0.0.1,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (06-Jul-2021 23:05:14.590) (total time: 959ms):
Jul 06 23:05:15 kpi4 k3s[19003]: Trace[1448889774]: ---"Object stored in database" 956ms (23:05:00.550)
Jul 06 23:05:15 kpi4 k3s[19003]: Trace[1448889774]: [959.885742ms] [959.885742ms] END
Jul 06 23:05:15 kpi4 k3s[19003]: I0706 23:05:15.883988   19003 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/4bc6e79a-e236-40f1-b3da-8e0e3a2652a4/volumes"
Jul 06 23:05:16 kpi4 k3s[19003]: E0706 23:05:16.476420   19003 cadvisor_stats_provider.go:415] "Partial failure issuing cadvisor.ContainerInfoV2" err="partial failures: [\"/kubepods/besteffort/pod58ecc470-54e1-4c2e-9541-9d73991a63b6\": RecentStats: unable to find data in memory cache], [\"/kubepods/besteffort/pod5d164fe7-52c0-4a82-95b7-b98cd0e98b45\": RecentStats: unable to find data in memory cache], [\"/kubepods/besteffort/pod1e49ea54-5e1d-45f4-bb66-a0edc9fd5d33\": RecentStats: unable to find data in memory cache]"
Jul 06 23:05:18 kpi4 k3s[19003]: I0706 23:05:18.151717   19003 trace.go:205] Trace[1400083480]: "Create" url:/api/v1/namespaces/longhorn-system/events,user-agent:k3s/v1.21.2+k3s1 (linux/arm64) kubernetes/5a67e8d,client:127.0.0.1,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/1.1 (06-Jul-2021 23:05:17.296) (total time: 854ms):
Jul 06 23:05:18 kpi4 k3s[19003]: Trace[1400083480]: ---"Object stored in database" 854ms (23:05:00.151)
Jul 06 23:05:18 kpi4 k3s[19003]: Trace[1400083480]: [854.805635ms] [854.805635ms] END
Jul 06 23:05:22 kpi4 k3s[19003]: E0706 23:05:22.190072   19003 cadvisor_stats_provider.go:415] "Partial failure issuing cadvisor.ContainerInfoV2" err="partial failures: [\"/kubepods/besteffort/pod1e49ea54-5e1d-45f4-bb66-a0edc9fd5d33\": RecentStats: unable to find data in memory cache], [\"/kubepods/besteffort/pod5d164fe7-52c0-4a82-95b7-b98cd0e98b45\": RecentStats: unable to find data in memory cache], [\"/kubepods/besteffort/pod58ecc470-54e1-4c2e-9541-9d73991a63b6\": RecentStats: unable to find data in memory cache]"
Jul 06 23:05:28 kpi4 k3s[19003]: I0706 23:05:28.747872   19003 trace.go:205] Trace[442557945]: "GuaranteedUpdate etcd3" type:*coordination.Lease (06-Jul-2021 23:05:24.855) (total time: 3891ms):
Jul 06 23:05:28 kpi4 k3s[19003]: Trace[442557945]: ---"Transaction committed" 3890ms (23:05:00.747)
Jul 06 23:05:28 kpi4 k3s[19003]: Trace[442557945]: [3.891711395s] [3.891711395s] END
Jul 06 23:05:28 kpi4 k3s[19003]: I0706 23:05:28.748673   19003 trace.go:205] Trace[1405767360]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cert-manager-cainjector-leader-election,user-agent:cainjector/v0.0.0 (linux/arm64) kubernetes/$Format/leader-election,client:192.168.1.224,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:05:24.855) (total time: 3893ms):
Jul 06 23:05:28 kpi4 k3s[19003]: Trace[1405767360]: ---"Object stored in database" 3892ms (23:05:00.747)
Jul 06 23:05:28 kpi4 k3s[19003]: Trace[1405767360]: [3.893193845s] [3.893193845s] END
Jul 06 23:05:31 kpi4 k3s[19003]: I0706 23:05:31.728917   19003 trace.go:205] Trace[624710701]: "GuaranteedUpdate etcd3" type:*core.Node (06-Jul-2021 23:05:30.449) (total time: 1278ms):
Jul 06 23:05:31 kpi4 k3s[19003]: Trace[624710701]: ---"Transaction committed" 1274ms (23:05:00.728)
Jul 06 23:05:31 kpi4 k3s[19003]: Trace[624710701]: [1.278964291s] [1.278964291s] END
Jul 06 23:05:31 kpi4 k3s[19003]: I0706 23:05:31.729974   19003 trace.go:205] Trace[2062471273]: "Patch" url:/api/v1/nodes/knode2/status,user-agent:k3s/v1.21.2+k3s1 (linux/arm64) kubernetes/5a67e8d,client:192.168.1.225,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/1.1 (06-Jul-2021 23:05:30.449) (total time: 1280ms):
Jul 06 23:05:31 kpi4 k3s[19003]: Trace[2062471273]: ---"Object stored in database" 1275ms (23:05:00.729)
Jul 06 23:05:31 kpi4 k3s[19003]: Trace[2062471273]: [1.280333798s] [1.280333798s] END
Jul 06 23:05:33 kpi4 k3s[19003]: I0706 23:05:33.028928   19003 trace.go:205] Trace[1160769760]: "GuaranteedUpdate etcd3" type:*coordination.Lease (06-Jul-2021 23:05:31.846) (total time: 1181ms):
Jul 06 23:05:33 kpi4 k3s[19003]: Trace[1160769760]: ---"Transaction committed" 1180ms (23:05:00.028)
Jul 06 23:05:33 kpi4 k3s[19003]: Trace[1160769760]: [1.181863717s] [1.181863717s] END
Jul 06 23:05:33 kpi4 k3s[19003]: I0706 23:05:33.029246   19003 trace.go:205] Trace[1689466342]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-attacher-leader-driver-longhorn-io,user-agent:csi-attacher/v0.0.0 (linux/arm64) kubernetes/$Format,client:192.168.1.224,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:05:31.846) (total time: 1182ms):
Jul 06 23:05:33 kpi4 k3s[19003]: Trace[1689466342]: ---"Object stored in database" 1182ms (23:05:00.028)
Jul 06 23:05:33 kpi4 k3s[19003]: Trace[1689466342]: [1.182969263s] [1.182969263s] END
Jul 06 23:05:54 kpi4 k3s[19003]: I0706 23:05:54.093440   19003 scope.go:111] "RemoveContainer" containerID="883f8773c1d547c3f05d9609bc5ca27b596800960e78be9bf97828f0b3e70039"
Jul 06 23:06:14 kpi4 k3s[19003]: I0706 23:06:14.062223   19003 trace.go:205] Trace[1030437697]: "GuaranteedUpdate etcd3" type:*core.ConfigMap (06-Jul-2021 23:06:12.316) (total time: 1745ms):
Jul 06 23:06:14 kpi4 k3s[19003]: Trace[1030437697]: ---"Transaction committed" 1744ms (23:06:00.062)
Jul 06 23:06:14 kpi4 k3s[19003]: Trace[1030437697]: [1.745158718s] [1.745158718s] END
Jul 06 23:06:14 kpi4 k3s[19003]: I0706 23:06:14.062559   19003 trace.go:205] Trace[1057281326]: "Update" url:/api/v1/namespaces/kube-system/configmaps/cert-manager-cainjector-leader-election,user-agent:cainjector/v0.0.0 (linux/arm64) kubernetes/$Format/leader-election,client:192.168.1.224,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:06:12.316) (total time: 1745ms):
Jul 06 23:06:14 kpi4 k3s[19003]: Trace[1057281326]: ---"Object stored in database" 1745ms (23:06:00.062)
Jul 06 23:06:14 kpi4 k3s[19003]: Trace[1057281326]: [1.745982452s] [1.745982452s] END
Jul 06 23:06:15 kpi4 k3s[19003]: I0706 23:06:15.308475   19003 trace.go:205] Trace[1033211365]: "GuaranteedUpdate etcd3" type:*coordination.Lease (06-Jul-2021 23:06:14.086) (total time: 1221ms):
Jul 06 23:06:15 kpi4 k3s[19003]: Trace[1033211365]: ---"Transaction committed" 1220ms (23:06:00.308)
Jul 06 23:06:15 kpi4 k3s[19003]: Trace[1033211365]: [1.221603821s] [1.221603821s] END
Jul 06 23:06:15 kpi4 k3s[19003]: I0706 23:06:15.308802   19003 trace.go:205] Trace[1515918817]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cert-manager-cainjector-leader-election,user-agent:cainjector/v0.0.0 (linux/arm64) kubernetes/$Format/leader-election,client:192.168.1.224,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:06:14.086) (total time: 1222ms):
Jul 06 23:06:15 kpi4 k3s[19003]: Trace[1515918817]: ---"Object stored in database" 1221ms (23:06:00.308)
Jul 06 23:06:15 kpi4 k3s[19003]: Trace[1515918817]: [1.222424648s] [1.222424648s] END
Jul 06 23:06:24 kpi4 k3s[19003]: I0706 23:06:24.800090   19003 csi_plugin.go:99] kubernetes.io/csi: Trying to validate a new CSI Driver with name: driver.longhorn.io endpoint: /var/lib/kubelet/plugins/driver.longhorn.io/csi.sock versions: 1.0.0
Jul 06 23:06:24 kpi4 k3s[19003]: I0706 23:06:24.800180   19003 csi_plugin.go:112] kubernetes.io/csi: Register new plugin with name: driver.longhorn.io at endpoint: /var/lib/kubelet/plugins/driver.longhorn.io/csi.sock
Jul 06 23:06:24 kpi4 k3s[19003]: E0706 23:06:24.888921   19003 nodeinfomanager.go:566] Invalid attach limit value 0 cannot be added to CSINode object for "driver.longhorn.io"
Jul 06 23:06:39 kpi4 k3s[19003]: I0706 23:06:39.501865   19003 trace.go:205] Trace[1052799017]: "GuaranteedUpdate etcd3" type:*core.ConfigMap (06-Jul-2021 23:06:36.197) (total time: 3304ms):
Jul 06 23:06:39 kpi4 k3s[19003]: Trace[1052799017]: ---"Transaction committed" 3303ms (23:06:00.501)
Jul 06 23:06:39 kpi4 k3s[19003]: Trace[1052799017]: [3.304448084s] [3.304448084s] END
Jul 06 23:06:39 kpi4 k3s[19003]: I0706 23:06:39.502403   19003 trace.go:205] Trace[393239039]: "Update" url:/api/v1/namespaces/kube-system/configmaps/cert-manager-cainjector-leader-election,user-agent:cainjector/v0.0.0 (linux/arm64) kubernetes/$Format/leader-election,client:192.168.1.224,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:06:36.196) (total time: 3305ms):
Jul 06 23:06:39 kpi4 k3s[19003]: Trace[393239039]: ---"Object stored in database" 3304ms (23:06:00.501)
Jul 06 23:06:39 kpi4 k3s[19003]: Trace[393239039]: [3.305488445s] [3.305488445s] END
Jul 06 23:06:40 kpi4 k3s[19003]: I0706 23:06:40.828365   19003 trace.go:205] Trace[1240402593]: "GuaranteedUpdate etcd3" type:*coordination.Lease (06-Jul-2021 23:06:39.534) (total time: 1293ms):
Jul 06 23:06:40 kpi4 k3s[19003]: Trace[1240402593]: ---"Transaction committed" 1292ms (23:06:00.828)
Jul 06 23:06:40 kpi4 k3s[19003]: Trace[1240402593]: [1.293659525s] [1.293659525s] END
Jul 06 23:06:40 kpi4 k3s[19003]: I0706 23:06:40.828723   19003 trace.go:205] Trace[1506576956]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cert-manager-cainjector-leader-election,user-agent:cainjector/v0.0.0 (linux/arm64) kubernetes/$Format/leader-election,client:192.168.1.224,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:06:39.533) (total time: 1294ms):
Jul 06 23:06:40 kpi4 k3s[19003]: Trace[1506576956]: ---"Object stored in database" 1293ms (23:06:00.828)
Jul 06 23:06:40 kpi4 k3s[19003]: Trace[1506576956]: [1.294764905s] [1.294764905s] END
Jul 06 23:06:43 kpi4 k3s[19003]: I0706 23:06:43.297530   19003 trace.go:205] Trace[118167507]: "GuaranteedUpdate etcd3" type:*core.Endpoints (06-Jul-2021 23:06:42.405) (total time: 892ms):
Jul 06 23:06:43 kpi4 k3s[19003]: Trace[118167507]: ---"Transaction committed" 891ms (23:06:00.297)
Jul 06 23:06:43 kpi4 k3s[19003]: Trace[118167507]: [892.32055ms] [892.32055ms] END
Jul 06 23:06:43 kpi4 k3s[19003]: I0706 23:06:43.297845   19003 trace.go:205] Trace[1338257210]: "Update" url:/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path,user-agent:local-path-provisioner/v0.0.0 (linux/arm64) kubernetes/$Format,client:10.42.0.4,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:06:42.404) (total time: 893ms):
Jul 06 23:06:43 kpi4 k3s[19003]: Trace[1338257210]: ---"Object stored in database" 892ms (23:06:00.297)
Jul 06 23:06:43 kpi4 k3s[19003]: Trace[1338257210]: [893.115784ms] [893.115784ms] END
Jul 06 23:08:06 kpi4 k3s[19003]: I0706 23:08:06.860661   19003 trace.go:205] Trace[894065588]: "GuaranteedUpdate etcd3" type:*core.Endpoints (06-Jul-2021 23:08:04.949) (total time: 1911ms):
Jul 06 23:08:06 kpi4 k3s[19003]: Trace[894065588]: ---"Transaction committed" 1910ms (23:08:00.860)
Jul 06 23:08:06 kpi4 k3s[19003]: Trace[894065588]: [1.911532738s] [1.911532738s] END
Jul 06 23:08:06 kpi4 k3s[19003]: I0706 23:08:06.860969   19003 trace.go:205] Trace[259400425]: "Update" url:/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path,user-agent:local-path-provisioner/v0.0.0 (linux/arm64) kubernetes/$Format,client:10.42.0.4,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:08:04.948) (total time: 1912ms):
Jul 06 23:08:06 kpi4 k3s[19003]: Trace[259400425]: ---"Object stored in database" 1911ms (23:08:00.860)
Jul 06 23:08:06 kpi4 k3s[19003]: Trace[259400425]: [1.912385195s] [1.912385195s] END
Jul 06 23:08:08 kpi4 k3s[19003]: I0706 23:08:08.846552   19003 trace.go:205] Trace[733674686]: "GuaranteedUpdate etcd3" type:*core.ConfigMap (06-Jul-2021 23:08:07.241) (total time: 1605ms):
Jul 06 23:08:08 kpi4 k3s[19003]: Trace[733674686]: ---"Transaction committed" 1604ms (23:08:00.846)
Jul 06 23:08:08 kpi4 k3s[19003]: Trace[733674686]: [1.605464429s] [1.605464429s] END
Jul 06 23:08:08 kpi4 k3s[19003]: I0706 23:08:08.846850   19003 trace.go:205] Trace[67028535]: "Update" url:/api/v1/namespaces/kube-system/configmaps/cert-manager-cainjector-leader-election,user-agent:cainjector/v0.0.0 (linux/arm64) kubernetes/$Format/leader-election,client:192.168.1.224,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:08:07.240) (total time: 1606ms):
Jul 06 23:08:08 kpi4 k3s[19003]: Trace[67028535]: ---"Object stored in database" 1605ms (23:08:00.846)
Jul 06 23:08:08 kpi4 k3s[19003]: Trace[67028535]: [1.606263109s] [1.606263109s] END
Jul 06 23:08:09 kpi4 k3s[19003]: I0706 23:08:09.553583   19003 trace.go:205] Trace[239102217]: "GuaranteedUpdate etcd3" type:*coordination.Lease (06-Jul-2021 23:08:08.869) (total time: 683ms):
Jul 06 23:08:09 kpi4 k3s[19003]: Trace[239102217]: ---"Transaction committed" 682ms (23:08:00.553)
Jul 06 23:08:09 kpi4 k3s[19003]: Trace[239102217]: [683.879442ms] [683.879442ms] END
Jul 06 23:08:09 kpi4 k3s[19003]: I0706 23:08:09.553890   19003 trace.go:205] Trace[827278646]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cert-manager-cainjector-leader-election,user-agent:cainjector/v0.0.0 (linux/arm64) kubernetes/$Format/leader-election,client:192.168.1.224,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:08:08.869) (total time: 684ms):
Jul 06 23:08:09 kpi4 k3s[19003]: Trace[827278646]: ---"Object stored in database" 684ms (23:08:00.553)
Jul 06 23:08:09 kpi4 k3s[19003]: Trace[827278646]: [684.683973ms] [684.683973ms] END
Jul 06 23:08:10 kpi4 k3s[19003]: I0706 23:08:10.812927   19003 trace.go:205] Trace[80803981]: "GuaranteedUpdate etcd3" type:*coordination.Lease (06-Jul-2021 23:08:09.592) (total time: 1219ms):
Jul 06 23:08:10 kpi4 k3s[19003]: Trace[80803981]: ---"Transaction committed" 1218ms (23:08:00.812)
Jul 06 23:08:10 kpi4 k3s[19003]: Trace[80803981]: [1.219993009s] [1.219993009s] END
Jul 06 23:08:10 kpi4 k3s[19003]: I0706 23:08:10.813351   19003 trace.go:205] Trace[412982872]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/kpi4,user-agent:k3s/v1.21.2+k3s1 (linux/arm64) kubernetes/5a67e8d,client:127.0.0.1,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/1.1 (06-Jul-2021 23:08:09.592) (total time: 1220ms):
Jul 06 23:08:10 kpi4 k3s[19003]: Trace[412982872]: ---"Object stored in database" 1220ms (23:08:00.813)
Jul 06 23:08:10 kpi4 k3s[19003]: Trace[412982872]: [1.220683466s] [1.220683466s] END
Jul 06 23:08:11 kpi4 k3s[19003]: I0706 23:08:11.595104   19003 trace.go:205] Trace[67238373]: "GuaranteedUpdate etcd3" type:*core.Endpoints (06-Jul-2021 23:08:10.929) (total time: 665ms):
Jul 06 23:08:11 kpi4 k3s[19003]: Trace[67238373]: ---"Transaction committed" 663ms (23:08:00.594)
Jul 06 23:08:11 kpi4 k3s[19003]: Trace[67238373]: [665.105041ms] [665.105041ms] END
Jul 06 23:08:11 kpi4 k3s[19003]: I0706 23:08:11.595400   19003 trace.go:205] Trace[1995907290]: "Update" url:/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path,user-agent:local-path-provisioner/v0.0.0 (linux/arm64) kubernetes/$Format,client:10.42.0.4,accept:application/json, */*,protocol:HTTP/1.1 (06-Jul-2021 23:08:10.929) (total time: 666ms):
Jul 06 23:08:11 kpi4 k3s[19003]: Trace[1995907290]: ---"Object stored in database" 665ms (23:08:00.595)
Jul 06 23:08:11 kpi4 k3s[19003]: Trace[1995907290]: [666.037441ms] [666.037441ms] END
Jul 06 23:08:53 kpi4 k3s[19003]: W0706 23:08:53.797420   19003 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Jul 06 23:13:53 kpi4 k3s[19003]: W0706 23:13:53.797631   19003 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Jul 06 23:18:53 kpi4 k3s[19003]: W0706 23:18:53.795917   19003 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Jul 06 23:23:42 kpi4 k3s[19003]: time="2021-07-06T23:23:42.498422650+02:00" level=error msg="Compact failed: failed to compact to revision 4327: database is locked"
Jul 06 23:23:53 kpi4 k3s[19003]: W0706 23:23:53.797186   19003 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Jul 06 23:28:53 kpi4 k3s[19003]: W0706 23:28:53.799131   19003 sysinfo.go:203] Nodes topology is not available, providing CPU topology

pi@kpi4:~ $  journalctl -xb0 -f
-- Logs begin at Mon 2021-07-05 21:25:05 CEST. --
Jul 06 23:45:13 kpi4 systemd[1]: run-containerd-runc-k8s.io-27a5e7c74fab26bc92d6cf65d06247ea9f03f177d72197406749c7612f5fd3a0-runc.NFFaIl.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit run-containerd-runc-k8s.io-27a5e7c74fab26bc92d6cf65d06247ea9f03f177d72197406749c7612f5fd3a0-runc.NFFaIl.mount has successfully entered the 'dead' state.
Jul 06 23:45:18 kpi4 systemd[666]: run-containerd-runc-k8s.io-27a5e7c74fab26bc92d6cf65d06247ea9f03f177d72197406749c7612f5fd3a0-runc.CPgcgF.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit UNIT has successfully entered the 'dead' state.
Jul 06 23:45:18 kpi4 systemd[1]: run-containerd-runc-k8s.io-27a5e7c74fab26bc92d6cf65d06247ea9f03f177d72197406749c7612f5fd3a0-runc.CPgcgF.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://www.debian.org/support

I checked that I can create empty volumes and write in them but that unit keeps on failing.. Is there anything I am missing that I can check @brandond ? I have created a gist with the full k3s journal

https://gist.github.com/alborotogarcia/546cf2fbc14932dd250bea1de887fcab

It fails only on raspberry 4b not on other arm64 devices I have.. But this error trace makes the raspberry node unresposive after a while..

alborotogarcia commented 3 years ago

I attach some logs from local-path

storage          csi-attacher-5dbfd9bf46-hsf47                    1/1     Running            3          58m     10.42.0.15      kpi4     <none>           <none>
kube-system      local-path-provisioner-5ff76fc89d-m2t6q          1/1     Running            7          64m     10.42.0.4       kpi4     <none>           <none>
cert-manager     cert-manager-cainjector-69d885bf55-65bhk         1/1     Running            9          63m     10.42.0.8       kpi4     <none>           <none>

k logs -n kube-system pod/local-path-provisioner-5ff76fc89d-m2t6q
I0707 02:44:33.468680       1 leaderelection.go:242] attempting to acquire leader lease  kube-system/rancher.io-local-path...
I0707 02:44:51.171328       1 leaderelection.go:252] successfully acquired lease kube-system/rancher.io-local-path
I0707 02:44:51.172856       1 controller.go:773] Starting provisioner controller rancher.io/local-path_local-path-provisioner-5ff76fc89d-m2t6q_42229385-ddd5-4822-8054-5ea804c390ad!
I0707 02:44:51.174361       1 event.go:281] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"rancher.io-local-path", UID:"eac70bb8-86cc-43bb-aab6-da1d1ee38b9e", APIVersion:"v1", ResourceVersion:"10857", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' local-path-provisioner-5ff76fc89d-m2t6q_42229385-ddd5-4822-8054-5ea804c390ad became leader
I0707 02:44:51.273181       1 controller.go:822] Started provisioner controller rancher.io/local-path_local-path-provisioner-5ff76fc89d-m2t6q_42229385-ddd5-4822-8054-5ea804c390ad!
I0707 02:56:44.138114       1 leaderelection.go:288] failed to renew lease kube-system/rancher.io-local-path: failed to tryAcquireOrRenew context deadline exceeded
F0707 02:56:44.138184       1 controller.go:851] leaderelection lost

brandond commented 3 years ago

Are you running Longhorn on a Pi 4? Are you using MMC/SD, or external storage? I suspect that you don't have sufficient disk IO throughput to support the control-plane, longhorn, and workloads.

alborotogarcia commented 3 years ago

Apparently the raspberry pi4 has overcome unresponsiveness (it used to fail within 24h range) by disabling avahi, but the error still persists @brandond

kube-system      sealed-secrets-controller-6ff6764757-9kvxt       1/1     Running            3          38h   10.42.8.4       knode2   <none>           <none>
kube-system      traefik-97b44b794-r8pj8                          1/1     Running            0          38h   10.42.8.3       knode2   <none>           <none>
ml               yatai-service-6d598d979c-wxlsd                   1/1     Running            10         20h   10.42.8.31      knode2   <none>           <none>
argo-rollouts    argo-rollouts-58dc89bcfb-zfn9c                   1/1     Running            0          14h   10.42.4.30      knode1   <none>           <none>
storage          csi-provisioner-578dc5df8d-wbvn6                 1/1     Running            81         38h   10.42.4.12      knode1   <none>           <none>
storage          csi-resizer-85fbd8459d-k2m6t                     1/1     Running            119        38h   10.42.4.14      knode1   <none>           <none>
storage          csi-attacher-5dbfd9bf46-zdcwn                    1/1     Running            78         38h   10.42.4.10      knode1   <none>           <none>
storage          csi-snapshotter-94bbf64bd-kbpr6                  1/1     Running            105        38h   10.42.8.15      knode2   <none>           <none>
storage          csi-resizer-85fbd8459d-2kqh7                     1/1     Running            84         38h   10.42.8.13      knode2   <none>           <none>
storage          csi-provisioner-578dc5df8d-hxnj9                 1/1     Running            89         38h   10.42.4.11      knode1   <none>           <none>
storage          csi-provisioner-578dc5df8d-9mmd2                 1/1     Running            99         38h   10.42.8.14      knode2   <none>           <none>
storage          csi-snapshotter-94bbf64bd-kcx59                  1/1     Running            86         38h   10.42.0.20      kpi4     <none>           <none>
storage          csi-attacher-5dbfd9bf46-k4sh5                    1/1     Running            111        38h   10.42.8.12      knode2   <none>           <none>
storage          csi-snapshotter-94bbf64bd-wtzn6                  1/1     Running            90         38h   10.42.4.13      knode1   <none>           <none>
storage          csi-attacher-5dbfd9bf46-96g6m                    1/1     Running            80         38h   10.42.8.11      knode2   <none>           <none>
storage          csi-resizer-85fbd8459d-q54dm                     1/1     Running            75         38h   10.42.0.19      kpi4     <none>           <none>
argo             workflow-controller-cb97dc8d5-fxfjv              1/1     Running            233        38h   10.42.4.17      knode1   <none>           <none>
kube-system      local-path-provisioner-5ff76fc89d-ktq9t          1/1     Running            337        38h   10.42.0.4       kpi4     <none>           <none>
cert-manager     cert-manager-cainjector-69d885bf55-k48dq         0/1     CrashLoopBackOff   375        38h   10.42.8.2       knode2   <none>           <none>

Wrt storage I am pursuing to use longhorn on my k3s cluster you're right but the raspi 4b uses a ssd as main disk on a usb3 port..

root@kpi4:/home/pi#  lshw -class disk -class storage
  *-usb                     
       description: Mass storage device
       product: AS2115
       vendor: ASMedia
       physical id: 1
       bus info: usb@2:1
       logical name: scsi0
       version: 0.01
       serial: 00000000000000000000
       capabilities: usb-3.00 scsi emulated
       configuration: driver=usb-storage speed=5000Mbit/s
     *-disk
          description: SCSI Disk
          product: 2115
          vendor: ASMT
          physical id: 0.0.0
          bus info: scsi@0:0.0.0
          logical name: /dev/sda
          version: 0
          serial: 00000000000000000000
          size: 223GiB (240GB)
          capabilities: partitioned partitioned:dos
          configuration: ansiversion=6 logicalsectorsize=512 sectorsize=512 signature=6393f405

pi@kpi4:~ $  cat /etc/fstab 
proc            /proc           proc    defaults          0       0
PARTUUID=6393f405-01  /boot           vfat    defaults          0       2
PARTUUID=6393f405-02  /               ext4    defaults,noatime  0       1

Disk /dev/sda: 223.6 GiB, 240057409536 bytes, 468862128 sectors
Disk model: 2115            
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x6393f405

Device     Boot  Start       End   Sectors   Size Id Type
/dev/sda1         8192    532479    524288   256M  c W95 FAT32 (LBA)
/dev/sda2       532480 468862127 468329648 223.3G 83 Linux

root@kpi4:/home/pi#  hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
    Model Number:       KingDian P10 240GB                      
    Serial Number:      2018071900719       
    Firmware Revision:  R0522A0 
    Media Serial Num:   
    Media Manufacturer: 
    Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
    Used: unknown (minor revision code 0x0110) 
    Supported: 9 8 7 6 5 
    Likely used: 9
Configuration:
    Logical     max current
    cylinders   16383   16383
    heads       16  16
    sectors/track   63  63
    --
    CHS current addressable sectors:    16514064
    LBA    user addressable sectors:   268435455
    LBA48  user addressable sectors:   468862128
    Logical  Sector size:                   512 bytes
    Physical Sector size:                   512 bytes
    Logical Sector-0 offset:                  0 bytes
    device size with M = 1024*1024:      228936 MBytes
    device size with M = 1000*1000:      240057 MBytes (240 GB)
    cache/buffer size  = unknown
    Form Factor: 2.5 inch
    Nominal Media Rotation Rate: Solid State Device
Capabilities:
    LBA, IORDY(can be disabled)
    Queue depth: 32
    Standby timer values: spec'd by Standard, no device specific minimum
    R/W multiple sector transfer: Max = 1   Current = 1
    Advanced power management level: 128
    DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
         Cycle time: min=120ns recommended=120ns
    PIO: pio0 pio1 pio2 pio3 pio4 
         Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
    Enabled Supported:
       *    SMART feature set
            Security Mode feature set
       *    Power Management feature set
       *    Write cache
       *    Look-ahead
       *    Host Protected Area feature set
       *    WRITE_BUFFER command
       *    READ_BUFFER command
       *    DOWNLOAD_MICROCODE
       *    Advanced Power Management feature set
            SET_MAX security extension
       *    48-bit Address feature set
       *    Mandatory FLUSH_CACHE
       *    FLUSH_CACHE_EXT
       *    SMART error logging
       *    SMART self-test
       *    General Purpose Logging feature set
       *    WRITE_{DMA|MULTIPLE}_FUA_EXT
       *    WRITE_UNCORRECTABLE_EXT command
       *    {READ,WRITE}_DMA_EXT_GPL commands
       *    Segmented DOWNLOAD_MICROCODE
       *    Gen1 signaling speed (1.5Gb/s)
       *    Gen2 signaling speed (3.0Gb/s)
       *    Gen3 signaling speed (6.0Gb/s)
       *    Native Command Queueing (NCQ)
       *    Phy event counters
       *    READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
            DMA Setup Auto-Activate optimization
       *    Software settings preservation
       *    SANITIZE feature set
       *    BLOCK_ERASE_EXT command
       *    DOWNLOAD MICROCODE DMA command
       *    WRITE BUFFER DMA command
       *    READ BUFFER DMA command
       *    Data Set Management TRIM supported (limit 8 blocks)
       *    Deterministic read ZEROs after TRIM
Security: 
    Master password revision code = 65534
        supported
    not enabled
    not locked
    not frozen
    not expired: security count
        supported: enhanced erase
    2min for SECURITY ERASE UNIT. 2min for ENHANCED SECURITY ERASE UNIT.

Do you think it may work out swapping it as an agent node?

brandond commented 3 years ago

I've never seen that message before, and it appears to be coming from systemd... so I'm not sure what the deal is. I have a couple Raspberry Pi 4s that I run as k3s servers as well (on Ubuntu, with external SSD), but I haven't personally tried Longhorn on them. It says it's successfully entering the dead state, which I'm not sure is a failure - is it possible that the log level just got turned up somewhere in your systemd config?

alborotogarcia commented 3 years ago

Same error appeared to me on prebuilt kernels, at least bootstrapping a k3s cluster, when I add longhorn it becomes more frequently probably due to IO throughtput as you say @brandond I enabled debug mode rebuilding the kernel.. as far as I remember CONFIG_RT_GROUP_SCHED, CONFIG_CGROUP_HUGETLB were missing on the built-in kernel.. that's why I rebuilt it.. Could you check if longhorn does good for you ? And if so tell me the ssd you use please :) I've spent last week on this and I am running out of ideas.. I was thinking on using an SD card and the ssd for longhorn data-dir.. but micro SD cards are slow as hell..

alborotogarcia commented 3 years ago

I've just realized that one of my nodes (knode2) has reached inotify watchers limits by opening sublime text.. @brandond

In addition, there are some diagnostic messages that I didn't check

jetson32@knode2:~$  journalctl -k |tail -n35
Jul 08 22:07:32 knode2 kernel: IPv6: eth0: IPv6 duplicate address fe80::d06b:afff:fec6:db30 detected!
Jul 08 22:07:40 knode2 kernel: cni0: port 23(veth3ba1b017) entered disabled state
Jul 08 22:07:40 knode2 kernel: device veth3ba1b017 left promiscuous mode
Jul 08 22:07:40 knode2 kernel: cni0: port 23(veth3ba1b017) entered disabled state
Jul 08 22:07:49 knode2 kernel: scsi host2: iSCSI Initiator over TCP/IP
Jul 08 22:07:49 knode2 kernel: scsi 2:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
Jul 08 22:07:49 knode2 kernel: scsi 2:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 08 22:07:49 knode2 kernel: sd 2:0:0:1: [sda] 4194304 512-byte logical blocks: (2.15 GB/2.00 GiB)
Jul 08 22:07:49 knode2 kernel: sd 2:0:0:1: [sda] Write Protect is off
Jul 08 22:07:49 knode2 kernel: sd 2:0:0:1: [sda] Mode Sense: 69 00 10 08
Jul 08 22:07:49 knode2 kernel: sd 2:0:0:1: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jul 08 22:07:57 knode2 kernel: sd 2:0:0:1: [sda] Attached SCSI disk
Jul 08 22:09:28 knode2 kernel: sd 2:0:0:1: [sda] Synchronizing SCSI cache
Jul 08 22:10:19 knode2 kernel: scsi host2: iSCSI Initiator over TCP/IP
Jul 08 22:10:19 knode2 kernel: scsi 2:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
Jul 08 22:10:19 knode2 kernel: scsi 2:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 08 22:10:19 knode2 kernel: sd 2:0:0:1: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
Jul 08 22:10:19 knode2 kernel: sd 2:0:0:1: [sda] Write Protect is off
Jul 08 22:10:19 knode2 kernel: sd 2:0:0:1: [sda] Mode Sense: 69 00 10 08
Jul 08 22:10:19 knode2 kernel: sd 2:0:0:1: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jul 08 22:10:19 knode2 kernel: sd 2:0:0:1: [sda] Attached SCSI disk
Jul 08 22:12:19 knode2 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
Jul 08 22:12:20 knode2 kernel: IPVS: Creating netns size=1928 id=26
Jul 08 22:12:20 knode2 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Jul 08 22:12:20 knode2 kernel: IPv6: ADDRCONF(NETDEV_UP): vethe2d3af3a: link is not ready
Jul 08 22:12:20 knode2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe2d3af3a: link becomes ready
Jul 08 22:12:20 knode2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jul 08 22:12:20 knode2 kernel: cni0: port 23(vethe2d3af3a) entered blocking state
Jul 08 22:12:20 knode2 kernel: cni0: port 23(vethe2d3af3a) entered disabled state
Jul 08 22:12:20 knode2 kernel: device vethe2d3af3a entered promiscuous mode
Jul 08 22:12:20 knode2 kernel: cni0: port 23(vethe2d3af3a) entered blocking state
Jul 08 22:12:20 knode2 kernel: cni0: port 23(vethe2d3af3a) entered forwarding state
Jul 08 22:12:20 knode2 kernel: IPv6: eth0: IPv6 duplicate address fe80::d845:e7ff:fe7d:5790 detected!
Jul 08 22:15:08 knode2 kernel: t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x4fc0ff000, fsynr=0x20003, cb=0, sid=86(0x56 - PCIE0), pgd=855adf003, pud=855adf003, pmd=7e4268003, pte=0
Jul 08 22:20:14 knode2 kernel: t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0x4fc86f000, fsynr=0x380003, cb=0, sid=86(0x56 - PCIE0), pgd=855adf003, pud=855adf003, pmd=839d8e003, pte=0
pi@kpi4:~ $  journalctl -k |tail -n 30
Jul 08 22:04:38 kpi4 kernel: cni0: port 17(veth18d8e7dc) entered disabled state
Jul 08 22:04:38 kpi4 kernel: device veth18d8e7dc entered promiscuous mode
Jul 08 22:04:38 kpi4 kernel: cni0: port 17(veth18d8e7dc) entered blocking state
Jul 08 22:04:38 kpi4 kernel: cni0: port 17(veth18d8e7dc) entered forwarding state
Jul 08 22:04:44 kpi4 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethc50adaea: link becomes ready
Jul 08 22:04:44 kpi4 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jul 08 22:04:44 kpi4 kernel: cni0: port 18(vethc50adaea) entered blocking state
Jul 08 22:04:44 kpi4 kernel: cni0: port 18(vethc50adaea) entered disabled state
Jul 08 22:04:44 kpi4 kernel: device vethc50adaea entered promiscuous mode
Jul 08 22:04:44 kpi4 kernel: cni0: port 18(vethc50adaea) entered blocking state
Jul 08 22:04:44 kpi4 kernel: cni0: port 18(vethc50adaea) entered forwarding state
Jul 08 22:04:45 kpi4 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jul 08 22:04:45 kpi4 kernel: cni0: port 19(vethcc1f3d40) entered blocking state
Jul 08 22:04:45 kpi4 kernel: cni0: port 19(vethcc1f3d40) entered disabled state
Jul 08 22:04:45 kpi4 kernel: device vethcc1f3d40 entered promiscuous mode
Jul 08 22:04:45 kpi4 kernel: cni0: port 19(vethcc1f3d40) entered blocking state
Jul 08 22:04:45 kpi4 kernel: cni0: port 19(vethcc1f3d40) entered forwarding state
Jul 08 22:05:32 kpi4 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth6891ad26: link becomes ready
Jul 08 22:05:32 kpi4 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jul 08 22:05:32 kpi4 kernel: cni0: port 20(veth6891ad26) entered blocking state
Jul 08 22:05:32 kpi4 kernel: cni0: port 20(veth6891ad26) entered disabled state
Jul 08 22:05:32 kpi4 kernel: device veth6891ad26 entered promiscuous mode
Jul 08 22:05:32 kpi4 kernel: cni0: port 20(veth6891ad26) entered blocking state
Jul 08 22:05:32 kpi4 kernel: cni0: port 20(veth6891ad26) entered forwarding state
Jul 08 22:05:32 kpi4 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jul 08 22:05:32 kpi4 kernel: cni0: port 21(veth969f545b) entered blocking state
Jul 08 22:05:32 kpi4 kernel: cni0: port 21(veth969f545b) entered disabled state
Jul 08 22:05:32 kpi4 kernel: device veth969f545b entered promiscuous mode
Jul 08 22:05:32 kpi4 kernel: cni0: port 21(veth969f545b) entered blocking state
Jul 08 22:05:32 kpi4 kernel: cni0: port 21(veth969f545b) entered forwarding state

jetson16@knode1:~$  journalctl -k |tail -n 30
Jul 08 22:10:07 knode1 kernel: device veth47620c56 entered promiscuous mode
Jul 08 22:10:07 knode1 kernel: cni0: port 29(veth47620c56) entered blocking state
Jul 08 22:10:07 knode1 kernel: cni0: port 29(veth47620c56) entered forwarding state
Jul 08 22:10:07 knode1 kernel: IPv6: eth0: IPv6 duplicate address fe80::c2d:a4ff:fe62:f965 detected!
Jul 08 22:10:09 knode1 kernel: scsi host2: iSCSI Initiator over TCP/IP
Jul 08 22:10:09 knode1 kernel: scsi 2:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
Jul 08 22:10:09 knode1 kernel: scsi 2:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 08 22:10:09 knode1 kernel: sd 2:0:0:1: [sda] 4194304 512-byte logical blocks: (2.15 GB/2.00 GiB)
Jul 08 22:10:09 knode1 kernel: sd 2:0:0:1: [sda] Write Protect is off
Jul 08 22:10:09 knode1 kernel: sd 2:0:0:1: [sda] Mode Sense: 69 00 10 08
Jul 08 22:10:09 knode1 kernel: sd 2:0:0:1: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jul 08 22:10:09 knode1 kernel: sd 2:0:0:1: [sda] Attached SCSI disk
Jul 08 22:12:06 knode1 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
Jul 08 22:12:06 knode1 kernel: IPVS: Creating netns size=1928 id=35
Jul 08 22:12:06 knode1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Jul 08 22:12:06 knode1 kernel: IPv6: ADDRCONF(NETDEV_UP): vethe8d73c70: link is not ready
Jul 08 22:12:06 knode1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe8d73c70: link becomes ready
Jul 08 22:12:06 knode1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jul 08 22:12:06 knode1 kernel: cni0: port 30(vethe8d73c70) entered blocking state
Jul 08 22:12:06 knode1 kernel: cni0: port 30(vethe8d73c70) entered disabled state
Jul 08 22:12:06 knode1 kernel: device vethe8d73c70 entered promiscuous mode
Jul 08 22:12:06 knode1 kernel: cni0: port 30(vethe8d73c70) entered blocking state
Jul 08 22:12:06 knode1 kernel: cni0: port 30(vethe8d73c70) entered forwarding state
Jul 08 22:12:06 knode1 kernel: IPv6: eth0: IPv6 duplicate address fe80::cc79:a7ff:fe05:29d9 detected!
Jul 08 22:12:42 knode1 kernel: cni0: port 22(veth1cefc6c4) entered disabled state
Jul 08 22:12:42 knode1 kernel: device veth1cefc6c4 left promiscuous mode
Jul 08 22:12:42 knode1 kernel: cni0: port 22(veth1cefc6c4) entered disabled state
Jul 08 22:13:35 knode1 kernel: cni0: port 23(vethd53af24c) entered disabled state
Jul 08 22:13:35 knode1 kernel: device vethd53af24c left promiscuous mode
Jul 08 22:13:35 knode1 kernel: cni0: port 23(vethd53af24c) entered disabled state

heiderich commented 2 years ago

@alborotogarcia Were you able to solve the problem?

alborotogarcia commented 2 years ago

@heiderich in the end I think it got solved by making a clean debian install (try debian 11 bullseye better) and set pi4 as a worker node.. you shouldn't have many problems, nowadays it gets even easier to set an external ssd as main filesystem on a raspberry pi 4, though I still notice that under high workloads it's the node that would get the most unresponsive, so mind balancing and get metrics..

k3s-io / k3s

Containerd mount service unit entered the 'dead' state on raspberry pi 4 (64-bit) #3574

cgroup v2

/etc/systemd/system/k3s.service

Having non-zero Limit*s causes performance problems due to accounting overhead

in the kernel. We recommend using cgroups to do container-local accounting.

/etc/systemd/system/k3s-node.service

Having non-zero Limit*s causes performance problems due to accounting overhead

in the kernel. We recommend using cgroups to do container-local accounting.