lxc / lxc

LXC - Linux Containers
https://linuxcontainers.org/lxc
Other
4.63k stars 1.12k forks source link

lxc-start fails to start with cgroups/cgfsng error setting up limits for devices #4138

Open juliangilbey opened 2 years ago

juliangilbey commented 2 years ago

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Required information

--- Control groups --- Cgroups: enabled Cgroup namespace: enabled

Cgroup v1 mount points: /sys/fs/cgroup/net_cls

Cgroup v2 mount points: /sys/fs/cgroup

Cgroup v1 systemd controller: missing Cgroup v1 freezer controller: missing Cgroup v1 clone_children flag: enabled Cgroup device: enabled Cgroup sched: enabled Cgroup cpu account: enabled Cgroup memory controller: enabled Cgroup cpuset: enabled

--- Misc --- Veth pair device: enabled, loaded Macvlan: enabled, not loaded Vlan: enabled, not loaded Bridges: enabled, loaded Advanced netfilter: enabled, loaded CONFIG_NF_NAT_IPV4: missing CONFIG_NF_NAT_IPV6: missing CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, not loaded CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, not loaded FUSE (for use with lxcfs): enabled, loaded

--- Checkpoint/Restore --- checkpoint restore: enabled CONFIG_FHANDLE: enabled CONFIG_EVENTFD: enabled CONFIG_EPOLL: enabled CONFIG_UNIX_DIAG: enabled CONFIG_INET_DIAG: enabled CONFIG_PACKET_DIAG: enabled CONFIG_NETLINK_DIAG: enabled File capabilities:

Note : Before booting a new kernel, you can check its configuration usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig

   * `uname -a`:  `Linux euler 5.17.0-1-amd64 #1 SMP PREEMPT Debian 5.17.3-1 (2022-04-18) x86_64 GNU/Linux`  (this is a stock Debian kernel, linux-image-5.17.0-1-amd64, version 5.17.3-1)
   * `cat /proc/self/cgroup`

1:net_cls:/ 0::/user.slice/user-1000.slice/session-8.scope

   * `cat /proc/1/mounts`

sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,nosuid,relatime,size=32810568k,nr_inodes=8202642,mode=755,inode64 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=6576420k,mode=755,inode64 0 0 /dev/nvme0n1p2 / ext4 rw,noatime 0 0 securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 0 0 cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0 pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0 efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0 bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=18248 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0 mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0 tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0 configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0 fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0 ramfs /run/credentials/systemd-sysusers.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 /dev/mapper/euler--vm-var /var ext4 rw,relatime 0 0 /dev/mapper/euler--vm-tmp /tmp ext4 rw,relatime 0 0 /dev/nvme0n1p1 /boot/efi vfat rw,noatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 0 /dev/mapper/data_crypt /storage/data ext4 rw,relatime,errors=remount-ro 0 0 /dev/mapper/euler--vm-home_crypt /home ext4 rw,relatime 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime 0 0 sunrpc /run/rpc_pipefs rpc_pipefs rw,relatime 0 0 lxcfs /var/lib/lxcfs fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0 /dev/mapper/euler-backup_crypt /storage/backup ext4 rw,relatime,errors=remount-ro 0 0 none /sys/fs/cgroup/net_cls cgroup rw,relatime,net_cls 0 0 tmpfs /run/user/1000 tmpfs rw,nosuid,nodev,relatime,size=6576416k,nr_inodes=1644104,mode=700,uid=1000,gid=1000,inode64 0 0 gvfsd-fuse /run/user/1000/gvfs fuse.gvfsd-fuse rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0 portal /run/user/1000/doc fuse.portal rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0


# Issue description

(A longer description of this appears at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1010469)

I created an LXC container using: `lxc-create -n debian-sid-amd64 -t download -- -d debian -r sid -a amd64` and then attempted to start it.  This failed, and the [log file](https://github.com/lxc/lxc/files/8864713/lxc.log) contained these error messages:

lxc-start debian-sid-amd64 20220608194712.148 ERROR cgfsng - cgroups/cgfsng.c:cg_legacy_set_data:2675 - No such file or directory - Failed to setup limits for the "devices" controller. The controller seems to be unused by "cgfsng" cgroup driver or not enabled on the cgroup hierarchy lxc-start debian-sid-amd64 20220608194712.148 ERROR cgfsng - cgroups/cgfsng.c:cgfsng_setup_limits_legacy:2742 - No such file or directory - Failed to set "devices.deny" to "a" lxc-start debian-sid-amd64 20220608194712.148 ERROR start - start.c:lxc_spawn:1890 - Failed to setup legacy device cgroup controller limits

This is really bizarre.  It was a clean install of the Debian `lxc` package.  I managed to make the container start by adding these two lines to the config:

lxc.cgroup.devices.allow = lxc.cgroup.devices.deny =

Eventually I reinstalled my entire system and the containers started without needing these extra settings (though I then noticed #4128, which may or may not be related).  However, the problem has just started again; these two lines in the config file allow me to start the container.

I haven't the foggiest idea what package interactions or hardware issues might cause an error like this.

# Steps to reproduce

I don't know how to reproduce it.  I've tried on the same machine using a Debian Live USB and it works fine, and I've tried on a different machine and it works fine.  So there's something really weird going on here.  I'm happy to try any experiments that you would like me to (within reason!) to help locate the cause of this issue.

# Information to attach

 - [x] any relevant kernel output (`dmesg`)

[23703.812695] audit: type=1400 audit(1654718985.729:67): apparmor="STATUS" operation="profileload" profile="/usr/bin/lxc-start" name="lxc-debian-sid-amd64</var/lib/lxc>" pid=46718 comm="apparmor_parser" [23703.850079] lxcbr0: port 1(vethCikr75) entered blocking state [23703.850086] lxcbr0: port 1(vethCikr75) entered disabled state [23703.850233] device vethCikr75 entered promiscuous mode [23703.850658] lxcbr0: port 1(vethCikr75) entered blocking state [23703.850665] lxcbr0: port 1(vethCikr75) entered forwarding state [23703.851624] eth0: renamed from vethvjA4al [23703.875776] lxcbr0: port 1(vethCikr75) entered disabled state [23703.878180] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [23703.878262] IPv6: ADDRCONF(NETDEV_CHANGE): vethCikr75: link becomes ready [23703.878445] lxcbr0: port 1(vethCikr75) entered blocking state [23703.878450] lxcbr0: port 1(vethCikr75) entered forwarding state [23704.150382] audit: type=1400 audit(1654718986.065:68): apparmor="STATUS" operation="profileremove" profile="/usr/bin/lxc-start" name="lxc-debian-sid-amd64</var/lib/lxc>" pid=46831 comm="apparmor_parser" [23704.208203] lxcbr0: port 1(vethCikr75) entered disabled state [23704.209143] device vethCikr75 left promiscuous mode [23704.209150] lxcbr0: port 1(vethCikr75) entered disabled state

 - [x] container log (The <log> file from running `lxc-start -n <c> -l TRACE -o <logfile> `)  [lxc.log](https://github.com/lxc/lxc/files/8864713/lxc.log)
 - [x] the containers configuration file

Template used to create this container: /usr/share/lxc/templates/lxc-download

Parameters passed to the template: -d debian -r sid -a amd64

For additional config options, please look at lxc.container.conf(5)

Uncomment the following line to support nesting containers:

lxc.include = /usr/share/lxc/config/nesting.conf

(Be aware this has security implications)

Distribution configuration

lxc.include = /usr/share/lxc/config/common.conf lxc.arch = linux64

Container specific configuration

lxc.apparmor.profile = generated lxc.apparmor.allow_nesting = 1 lxc.rootfs.path = dir:/var/lib/lxc/debian-sid-amd64/rootfs lxc.uts.name = debian-sid-amd64

Network configuration

lxc.net.0.type = veth lxc.net.0.link = lxcbr0 lxc.net.0.flags = up


The config files in `/usr/share/lxc/config` are the standard Debian ones.
Moghul commented 1 year ago

I'm having the same problem, and the 2 lines you provided also allow me to start the LXC and attach to it. However, now the LXC won't get an ipv4 address, so I can't run my normal workflow. Have you had any progress on this?

It's particularly frustrating because from my perspective all I did before was stop the LXC and take a snapshot. Nothing that should break anything.

Edit: It was something wrong with my host machine. I had removed a VPN some time before and for some reason something went wrong and it kept the lxc from starting again. Restarting did not work. I had to fully shut down my pc for it to be fixed.