Closed matti closed 3 years ago
the "root cause" seems to be
grpc: addrConn.createTransport failed to connect to {/var/lib/k0s/run/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/lib/k0s/run/containerd.sock: connect: no such file or directory". Reconnecting... component=kubelet
If you look at the processes, I'd guess that containerd is not really running?
Before kubelet getting into crashloop, do you see anything in logs saying why containerd cannot start?
nope, tried to paste as much log as possible
hi. same, fedora 33, node not coming up after reboot ( /var/lib/k0s/run/containerd.sock not such file : log attached from start to stop) k0s.txt
Having same issue on Raspberry PI 4. When creating new cluster it works, at least for while. After scheduling some pods, it started crashing like this.
Feels like it could be also race condition: RPI 4 (without much cooling, and running master along with worker) can get a bit slow.
In the logs from @kstych I see the following:
time="2020-12-19 15:53:08" level=info msg="W1219 15:53:08.863977 2860 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {/var/lib/k0s/run/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial unix /var/lib/k0s/run/containerd.sock: connect: no such file or directory\". Reconnecting..." component=kubelet
time="2020-12-19 15:53:09" level=info msg="Shutting down pid 2860" component=kubelet
time="2020-12-19 15:53:09" level=info msg="Shutting down pid 2718" component=containerd
time="2020-12-19 15:53:10" level=info msg="E1219 15:53:10.979982 2662 resource_quota_controller.go:408] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request" component=kube-controller-manager
time="2020-12-19 15:53:12" level=info msg="W1219 15:53:12.213435 2662 garbagecollector.go:642] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]" component=kube-controller-manager
time="2020-12-19 15:53:14" level=info msg="Shutting down pid 2718" component=containerd
And naturally after containerd is down and kubelet thus busted, it's a quick slide downhill.
I do not see containerd logging anything useful as to the reason why it's shutting down.
hi. same, fedora 33, node not coming up after reboot ( /var/lib/k0s/run/containerd.sock not such file : log attached from start to stop) k0s.txt
Hi @kstych. Can you verify if /var/lib/k0s/run/containerd.sock
exists?
Alternatively, can you see if the following flags provide more output that we can use for debugging?
k0s worker --token-file <file> --debug --logging containerd=debug
hi @trawler sure, please find attached here running single node cluster now same result. install works fine on the first time everything comes up in about 2 min
after stopping k0s (CTRL+C) and reboot and same command. the node becomes NotReady (unreachable)
command : k0s server -c ${HOME}/.k0s/k0s.yaml --enable-worker --debug --logging containerd=debug also the file doesnot exists as in the error : /var/lib/k0s/run/containerd.sock
@kstych Are the logs only from the time after the reboot?
So based on the logs it looks like the order of things happening is:
time="2020-12-21 14:20:45" level=info msg="Started succesfully, go nuts" component=containerd
"time=\"2020-12-21T14:20:48.198133019Z\" level=debug msg=\"garbage collected\" d=785.660158ms" component=containerd
time="2020-12-21 14:21:01" level=info msg="W1221 14:21:01.670147 1755 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {/var/lib/k0s/run/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial unix /var/lib/k0s/run/containerd.sock: connect: no such file or directory\". Reconnecting..." component=kubelet
...
time=\"2020-12-21T14:22:27.879723672Z\" level=warning msg=\"cleaning up after shim disconnected\" id=032766e608a5729f0b27a4478f84034761b3c3e5383d58ca7dc12a090607c561 namespace=k8s.io" component=containerd
...
time="2020-12-21 14:22:28" level=info msg="time=\"2020-12-21T14:22:28.503107439Z\" level=warning msg=\"cleanup warnings time=\\\"2020-12-21T14:22:28Z\\\" level=info msg=\\\"starting signal loop\\\" namespace=k8s.io pid=1984\\n\"" component=containerd
time="2020-12-21 14:22:28" level=info msg="time=\"2020-12-21T14:22:28.523903086Z\" level=debug msg=\"event published\" ns=k8s.io topic=/tasks/exit type=containerd.events.TaskExit" component=containerd
time="2020-12-21 14:22:28" level=info msg="time=\"2020-12-21T14:22:28.523936035Z\" level=debug msg=\"event published\" ns=k8s.io topic=/tasks/delete type=containerd.events.TaskDelete" component=containerd
So based on the logs, it seems like containerd itself _might_ be up-and-running, just the socket missing. I cannot find anything in the containerd log entries that hint it being anyway broken. 🤔
So what would be interesting to see is:
- is the `/var/lib/k0s/run/containerd.sock` really existing or not?
- is containerd really listening on it?
hi @jnummelin yes i can see the process containerd running but there is no sock file
this is the only matching file across the filesystem
[root@k8s /]# find . | grep containerd.sock
./var/lib/k0s/run/containerd.sock.ttrpc
after cleaning up everything
cd /var/lib ; rm -rf calico cni k0s kubelet
cd ~ ; rm -rf .k0s .kube
and re running single node command, now there is containerd.sock
[root@k8s ~]# cd /
[root@k8s /]# find . | grep containerd.sock
./var/lib/k0s/run/containerd.sock.ttrpc
./var/lib/k0s/run/containerd.sock
wait for the pods
[root@k8s /]# kubectl get node,pods -A
NAME STATUS ROLES AGE VERSION
node/k8s.kstych.com Ready <none> 2m17s v1.19.4
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-kube-controllers-5f6546844f-ttnfz 1/1 Running 0 2m40s
kube-system pod/calico-node-4w2cm 1/1 Running 0 69s
kube-system pod/coredns-5c98d7d4d8-5tp4d 1/1 Running 0 2m46s
kube-system pod/konnectivity-agent-dwbc5 1/1 Running 0 2m12s
kube-system pod/kube-proxy-ptvtj 1/1 Running 0 2m17s
then reboot, run the same command
after a while node is NotReady and there is no containerd.sock
[root@k8s /]# kubectl get node,pods -A
NAME STATUS ROLES AGE VERSION
node/k8s.kstych.com NotReady <none> 7m32s v1.19.4
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-kube-controllers-5f6546844f-ttnfz 1/1 Running 0 7m55s
kube-system pod/calico-node-4w2cm 1/1 Running 0 6m24s
kube-system pod/coredns-5c98d7d4d8-5tp4d 1/1 Running 0 8m1s
kube-system pod/konnectivity-agent-dwbc5 1/1 Running 0 7m27s
kube-system pod/kube-proxy-ptvtj 1/1 Running 0 7m32s
[root@k8s /]# find . | grep containerd.sock
./var/lib/k0s/run/containerd.sock.ttrpc
@kstych do you have k0s starting as systemd unit or something? Could you check couple more things:
cat /proc/<containerd-pid>/cmdline
. it should have --address=/var/lib/k0s/run/containerd.sock
netstat -a -l -x
: this shows us the active unix socketsThis is really puzzling, why is it able to create and listen on the ttrpc sock but not the normal one. 🤔 And why does this manifest only after reboot. Is your /var/lib
path somehow differently mounted on/during boot?
I wonder if it could be something like SELinux, AppArmor or alike that's preventing containerd to create the unix socket?
(pushed wrong button) 🤦
hi @jnummelin , selinux is off, firewall is off, there is a single ext4 / partition I am running the command in a sreen session each time as root
post reboot netstat is attached (no sock file after reboot) commands are same
also after reboot CTRL+C doesnot stops the command (first time it does but leaves other processes running) after reboot pressing CTRL+C just keeps going like this
INFO[2020-12-22 18:46:06] Shutting down pid 1655 component=containerd
INFO[2020-12-22 18:46:11] Shutting down pid 1655 component=containerd
INFO[2020-12-22 18:46:16] Shutting down pid 1655 component=containerd
INFO[2020-12-22 18:46:21] Shutting down pid 1655 component=containerd
INFO[2020-12-22 18:46:26] Shutting down pid 1655 component=containerd
^CINFO[2020-12-22 18:46:31] Shutting down pid 1655 component=containerd
[root@k8s ~]# cat /proc/<first-run-containerd-pid>/cmdline
/var/lib/k0s/bin/containerd--root=/var/lib/k0s/containerd--state=/var/lib/k0s/run/containerd--address=/var/lib/k0s/run/containerd.sock--log-level=info--config=/etc/k0s/containerd.toml
[root@k8s /]# cat /proc/<reboot-containerd-pid>/cmdline
/var/lib/k0s/bin/containerd--root=/var/lib/k0s/containerd--state=/var/lib/k0s/run/containerd--address=/var/lib/k0s/run/containerd.sock--log-level=info--config=/etc/k0s/containerd.toml
mount
[root@k8s /]# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,noexec,size=8138164k,nr_inodes=2034541,mode=755,inode64)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,size=3263456k,nr_inodes=819200,mode=755,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
none on /sys/kernel/tracing type tracefs (rw,relatime)
/dev/sda3 on / type ext4 (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15686)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=8158636k,nr_inodes=409600,inode64)
/dev/sda2 on /boot type ext4 (rw,relatime)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=1631724k,nr_inodes=407931,mode=700,uid=1000,gid=1000,inode64)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
Confirming both of my test cases are exhibiting the same behavior on rc1 0.9.1: Scenario:
Manually killing/replacing the containerd process with log set to debug:
root@ubuntu:~# /var/lib/k0s/bin/containerd --root=/var/lib/k0s/containerd --state=/var/lib/k0s/run/containerd --address=/var/lib/k0s/run/containerd.sock --log-level=debug --config=/etc/k0s/containerd.toml
INFO[2020-12-23T06:17:31.282915843Z] starting containerd revision=269548fa27e0089a8b8278fc4fc781d7f65a939b version=v1.4.3
INFO[2020-12-23T06:17:31.316922600Z] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[2020-12-23T06:17:31.316991706Z] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
INFO[2020-12-23T06:17:31.321878156Z] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
INFO[2020-12-23T06:17:31.322265753Z] skip loading plugin "io.containerd.snapshotter.v1.btrfs"... error="path /var/lib/k0s/containerd/io.containerd.snapshotter.v1.btrfs (xfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2020-12-23T06:17:31.322304350Z] loading plugin "io.containerd.snapshotter.v1.devmapper"... type=io.containerd.snapshotter.v1
WARN[2020-12-23T06:17:31.322328449Z] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
INFO[2020-12-23T06:17:31.322342183Z] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[2020-12-23T06:17:31.322366042Z] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[2020-12-23T06:17:31.322445232Z] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
INFO[2020-12-23T06:17:31.322651908Z] skip loading plugin "io.containerd.snapshotter.v1.zfs"... error="path /var/lib/k0s/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2020-12-23T06:17:31.322670913Z] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[2020-12-23T06:17:31.322689279Z] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
INFO[2020-12-23T06:17:31.322697789Z] metadata content store policy set policy=shared
Okay maybe konnectivity-server
is throwing us for loops?:
level=info msg="Error: failed to run the master server: failed to get uds listener: failed to listen(unix) name /var/lib/k0s/run/konnectivity-server/konnectivity-server.sock: listen unix /var/lib/k0s/run/konnectivity-server/konnectivity-server.sock: bind: address already in use" component=konnectivity
So this is interesting because konnectivity is attempting to listen on these ports:
Server port set to 0." component=konnectivity
Agent port set to 8132." component=konnectivity
Admin port set to 8133." component=konnectivity
Health port set to 8092." component=konnectivity
but the host reports:
root@ubuntu:~# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN 2753/etcd
tcp 0 0 192.168.1.124:2380 0.0.0.0:* LISTEN 2753/etcd
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 2775/kube-controlle
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 2774/kube-scheduler
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 890/sshd: /usr/sbin
tcp6 0 0 :::10251 :::* LISTEN 2774/kube-scheduler
tcp6 0 0 :::6443 :::* LISTEN 2772/kube-apiserver
tcp6 0 0 :::10252 :::* LISTEN 2775/kube-controlle
tcp6 0 0 :::22 :::* LISTEN 890/sshd: /usr/sbin
tcp6 0 0 :::9443 :::* LISTEN 2777/k0s
udp 0 0 192.168.1.124:68 0.0.0.0:* 711/systemd-network
.... and digging further, it appears containerd.sock is our higher level issue:
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.689717 10500 container_manager_linux.go:279] Creating Container Manager object based on Node Config: {RuntimeCgroupsName:/system.slice/containerd.service SystemCgroupsName: KubeletCgroupsName:/system.slice/containerd.service ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/k0s/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName:system.slice SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.690172 10500 topology_manager.go:120] [topologymanager] Creating topology manager with none policy per container scope" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.690193 10500 container_manager_linux.go:310] [topologymanager] Initializing Topology Manager with none policy and container-level scope" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.690200 10500 container_manager_linux.go:315] Creating device plugin manager: true" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.690340 10500 remote_runtime.go:62] parsed scheme: \"\"" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.690351 10500 remote_runtime.go:62] scheme \"\" not registered, fallback to default scheme" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.691045 10500 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/k0s/run/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.691062 10500 clientconn.go:948] ClientConn switching balancer to \"pick_first\"" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.691141 10500 remote_image.go:50] parsed scheme: \"\"" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.691155 10500 remote_image.go:50] scheme \"\" not registered, fallback to default scheme" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.691166 10500 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/k0s/run/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.691170 10500 clientconn.go:948] ClientConn switching balancer to \"pick_first\"" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="I1223 06:54:49.691196 10500 kubelet.go:273] Watching apiserver" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="W1223 06:54:49.691676 10500 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {/var/lib/k0s/run/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial unix /var/lib/k0s/run/containerd.sock: connect: no such file or directory\". Reconnecting..." component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="E1223 06:54:49.692689 10500 remote_runtime.go:86] Version from runtime service failed: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/lib/k0s/run/containerd.sock: connect: no such file or directory\"" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="W1223 06:54:49.692751 10500 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {/var/lib/k0s/run/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial unix /var/lib/k0s/run/containerd.sock: connect: no such file or directory\". Reconnecting..." component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="E1223 06:54:49.692880 10500 kuberuntime_manager.go:202] Get runtime version failed: get remote runtime typed version failed: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/lib/k0s/run/containerd.sock: connect: no such file or directory\"" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="F1223 06:54:49.693011 10500 server.go:269] failed to run Kubelet: failed to create kubelet: get remote runtime typed version failed: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/lib/k0s/run/containerd.sock: connect: no such file or directory\"" component=kubelet
Dec 23 06:54:49 ubuntu k0s[2741]: time="2020-12-23 06:54:49" level=info msg="goroutine 1 [running]:" component=kubelet
Missing containerd.sock here definitely seems to be the top level culprit. I wonder if reboot makes k0s/containerd go down "too hard" and thus something (maybe the socket file itself) is left lingering. That's what the bind: address already in use
kinda hints on konnectivity.
One possible workaround to try is to remove everything under /var/lib/k0s/run
after reboot and before k0s is started.
hi @jnummelin infact I just tried that and was going to report that it works (after reboot delete the /var/lib/k0s/run folder and then restart)
just wanted to know that it is safe? no useful files to keep here?
it is safe, if you do it when k0s and related processes are not running. there's only socket file, pid files and containerd state which is "ephemeral" and can be deleted on reboot.
Of course we need to come up with a proper solution for this.
Also I'm seriously thinking this is also a "bug" in containerd side. It's kinda not expected that it does get up-and-running but fails to listen on the configured socket. And nothing in the logs says it's not operational.
Added the following to my Systemd unit for when I test again later on:
ExecStartPre=-/usr/bin/rm -rf /var/lib/k0s/run
Definitely not a pretty solution as stoping/starting/restarting should not have that effect in non host reboot scenarios.
Do we have a quick read on the technical details in swapping out BYO CRI-O instead of Containerd and I can compare their behaviors?
I have crio.sock at /run/crio/crio.sock
I guess I found k0s worker --cri-socket
flag, but i'm currently testing the all in one node k0s server --enable-worker
method.
not cri-o, but docker: https://docs.k0sproject.io/v0.9.0/custom-cri-runtime/
Yep, gotcha.
I just tracked down the supported flags.
Here we can set cri-socket on cli via k0s worker --cri-socket remote:unix:///run/crio/crio.sock
But sub command k0s server
does not support that flag
Sad day, I'm not situated to test a supported topology right now.
But sub command
k0s server
does not support that flagSad day, I'm not situated to test a supported topology right now.
That's a valid point. I opened #579 to track this feature request.
--cri-socket
is supported now.
/run
as a state dir which fixes this.
Version
Platform
What happened? Worker started, shows up as a node in master. Then I added crontab
and then rebooted and it's not coming up anymore, see https://gist.githubusercontent.com/matti/f24e0f0080298e79d7c2c9e4500b5a89/raw/ae42398487051ccb7c3bd48c1ca5f153c6545cea/k0s-worker.txt