rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 403 forks source link

Failed to mkdir /usr/lib64 read-only filesystem: Rancher logging deployed to K3OS cluster fails to start pods: #490

Open philipsparrow opened 4 years ago

philipsparrow commented 4 years ago

Version (k3OS / kernel) v0.10.1 5.0.0-43-generic

Architecture x86_64

Describe the bug Enabling Syslog logging (in Rancher) deploys pods that fail to start due to K3OS readonly filesystems. rancher-logging-fluentd-linux pods get the error: Error: failed to generate container "<containerID>" spec: failed to mkdir "/usr/lib64": mkdir /usr/lib64: read-only file system Because there is a hostPath mount of /usr/lib64 which doesn't exist in K3OS so I think Kubernetes tries to create the directory

To Reproduce Add a K3OS cluster to Rancher server, enable Syslog logging, on the cluster (any configuration will suffice). The DaemonSet pods fail to start because Kubernetes cannot mkdir /usr/lib64 on a read-only filesystem.

Expected behavior kubectl get pods -n=cattle-logging should show pods at Ready status.

Actual behavior

kubectl get pods -n=cattle-logging    
NAME                                         READY   STATUS                 RESTARTS   AGE
rancher-logging-log-aggregator-linux-q6nsc   0/1     CreateContainerError   0          61m
rancher-logging-fluentd-linux-4scwh          1/2     CreateContainerError   0          61m
kubectl describe pod -n=cattle-logging rancher-logging-log-aggregator-linux-5mxcs
...
Events:
  Type     Reason     Age                   From                         Message
  ----     ------     ----                  ----                         -------
  Normal   Scheduled  <unknown>             default-scheduler            Successfully assigned cattle-logging/rancher-logging-log-aggregator-linux-5mxcs to phil-k3os-master-0
  Normal   Pulling    57m                   kubelet, phil-k3os-master-0  Pulling image "rancher/log-aggregator:v0.1.6"
  Normal   Pulled     57m                   kubelet, phil-k3os-master-0  Successfully pulled image "rancher/log-aggregator:v0.1.6"
  Warning  Failed     57m                   kubelet, phil-k3os-master-0  Error: failed to generate container "dd5971950fe6b445cd5ef8335fb93b73844f5ae2c6efeae7329c31c99a5ea111" spec: failed to mkdir "/usr/libexec/kubernetes/kubelet-plugins/volume/exec": mkdir /usr/libexec/kubernetes: read-only file system
...

Additional context I think a related issue has recently been resolved by https://github.com/rancher/k3os/pull/447 This related issue is for rancher-logging-log-aggregator-linux pods getting the error: Error: failed to generate container "<containerID>" spec: failed to mkdir "/usr/libexec/kubernetes/kubelet-plugins/volume/exec": mkdir /usr/libexec/kubernetes: read-only file system I'm guessing that a similar fix is required here to create an empty /usr/lib64 directory or to symlink /usr/lib to /usr/lib64.

zimme commented 4 years ago

From what I can read it seems as though /lib should be symlinked to either /lib64 or /lib32 and the same for /usr/lib to /usr/lib64 or /usr/lib32

philipsparrow commented 4 years ago

From what I can read it seems as though /lib should be symlinked to either /lib64 or /lib32 and the same for /usr/lib to /usr/lib64 or /usr/lib32

That seems fair. If it needs to be done the same way as you did yours then I'll need to look into why the path was /usr/src/image/

kdjsfgodsfg commented 4 years ago

Hi everyone,

I actually have the same difficulties trying to deploy fluentd on a single-node k3s cluster.

I deployed Rancher and k3s in an air-gaped environnment, imported my k3c cluster in Rancher I pull images through a private repository I had no problem deploying personnal apps or things like Wordpress, Fluentd-aggregator, Longhorn

But trying to enable logging from Rancher UI does not work

kubectl get pods -n cattle-logging

NAME                                         READY   STATUS                 RESTARTS   AGE
rancher-logging-log-aggregator-linux-5gkjc   1/1     Running                1          `123m`
rancher-logging-fluentd-linux-5f52b          1/2     CreateContainerError   0          19m

kubectl describe pod rancher-logging-fluentd-linux Warning Failed 6m57s kubelet, k3s-node.mydomain.com Error: failed to generate container "8c4f16a28392d86f548f6a344abbac6b4edf2c70769cb1f34931be90575dde80" spec: failed to mkdir "/usr/lib64": mkdir /usr/lib64: read-only file system

Also, even when I ssh this node I cant manually create the directory /usr/lib64 mkdir: cannot create directory ‘/usr/lib64’: Read-only file system

Trying to change authorisations on /usr/ directory does not work, when I type sudo chmod -R 777 ./usr/ chmod: changing permissions of './usr/src/linux-headers-5.0.0-43-generic/tools/objtool/subcmd-config.o': Read-only file system

Same goes for every file in the /usr/ directory.

So going through differents forums I was able to find some informations but I don't really know if this part is relevant :

k3s-node[/]$ fdisk -l
Disk /dev/loop1: 56.55 MiB, 59281408 bytes, 115784 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/loop2: 223.3 MiB, 234135552 bytes, 457296 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sda: 20 GiB, 21474836480 bytes, 41943040 sectors
Disk model: Virtual disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 860D0536-5838-48BB-ADEC-6B50975FC623

Device     Start      End  Sectors Size Type
/dev/sda1   2048    98303    96256  47M EFI System
/dev/sda2  98304 41943006 41844703  20G Linux filesystem

But when trying to launch a filesystem check, /dev/sda1 return an error:

k3s-node[/home/rancher]$ fsck.ext4 /dev/sda1
e2fsck 1.45.5 (07-Jan-2020)
ext2fs_open2: Bad magic number in super-block
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/sda1

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

/dev/sda1 contains a vfat file system labelled 'K3OS_GRUB'

No special comportements for the other partitions:

k3s-node[/home/rancher]$ fsck.ext4 /dev/sda
e2fsck 1.45.5 (07-Jan-2020)
/dev/sda is in use.
e2fsck: Cannot continue, aborting.

Restoring superblocks does not correct the bug.

Thanks for you help, I will keep you tuned with my developments

Version k3OS : v0.10.2

Version K3s : v1.17.6+k3s1

Version Rancher : v2.4.4

dweomer commented 4 years ago

Folks, /usr is a read-only mount of a squashfs appended to the /k3os/system/k3os/current/k3os executable. It would be nice for such use-cases if it was a lowerdir of an overlayfs but that isn't currently the case. However, the rootfs is writable so setting up such might be an adventure to your liking?

As for /usr/lib(32|64) I believe this is a glibc-ism whereas k3OS userspace is based on Alpine and therefore musl-c. My concern here is that the runtimes we are attempting to install rely on glibc which will absolutely break.

All of this said, I would like rancher/rancher syslog to work =) So, I will look into what can be done.