lxc / lxcfs

FUSE filesystem for LXC
https://linuxcontainers.org/lxcfs
Other
1.04k stars 251 forks source link

System in container can take CPUs on host offline #629

Closed niansa closed 6 months ago

niansa commented 6 months ago

Required information

Issue description

Apparently the system in the container is able to take CPUs on the host offline/back online by writing to /sys/devices/system/cpu/cpu*/online

Steps to reproduce

  1. Create a Ubuntu 24.04 (for example) instance
  2. Open a shell in the instance
  3. cd to /sys/devices/system/cpu/cpu0/
  4. As root, write 0 into the file online
  5. Log out of the shell
  6. Check htop or any other tool and you'll see the core is offline on the host system

Information to attach

[6669597.241904] smpboot: CPU 1 is now offline
Name: ubuntu
Status: RUNNING
Type: container
Architecture: x86_64
PID: 3742945
Created: 2024/03/14 09:18 CET
Last Used: 2024/03/14 09:27 CET

Resources:
  Processes: 14
  CPU usage:
    CPU usage (in seconds): 11
  Memory usage:
    Memory (current): 307.64MiB
  Network usage:
    eth0:
      Type: broadcast
      State: UP
      Host interface: veth53b86c1b
      MAC address: 00:16:3e:13:72:ee
      MTU: 1500
      Bytes received: 58.28MB
      Bytes sent: 5.81MB
      Packets received: 40871
      Packets sent: 22334
      IP addresses:
        inet:  10.41.89.171/24 (global)
        inet6: fd42:97dc:6ffa:abeb:216:3eff:fe13:72ee/64 (global)
        inet6: fe80::216:3eff:fe13:72ee/64 (link)
    lo:
      Type: loopback
      State: UP
      MTU: 65536
      Bytes received: 5.79kB
      Bytes sent: 5.79kB
      Packets received: 44
      Packets sent: 44
      IP addresses:
        inet:  127.0.0.1/8 (local)
        inet6: ::1/128 (local)

Log:
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu noble amd64 (20240313_07:42)
  image.os: Ubuntu
  image.release: noble
  image.requirements.cgroup: v2
  image.serial: "20240313_07:42"
  image.type: squashfs
  image.variant: default
  limits.cpu: 10-17
  limits.cpu.priority: "3"
  limits.memory: 8GiB
  limits.processes: "5120"
  volatile.base_image: 2699b561cdee3145a56b8a990965456f7524febb37473994618c9d2fb56aaf6c
  volatile.cloud-init.instance-id: ddfb179d-3369-4e20-bacf-f0bdf3b93ff7
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.incubr0.host_name: veth53b86c1b
  volatile.incubr0.hwaddr: 00:16:3e:13:72:ee
  volatile.incubr0.name: eth0
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: 9c72b2c7-9129-4c8d-acdf-02c652d4e312
  volatile.uuid.generation: 9c72b2c7-9129-4c8d-acdf-02c652d4e312
devices:
  incubr0:
    limits.egress: 500Mbit
    limits.ingress: 500Mbit
    nictype: bridged
    parent: incubr0
    type: nic
  root:
    path: /
    pool: hdd
    type: disk
ephemeral: false
profiles:
- restricted
stateful: false
description: ""
time="2024-03-14T08:22:12+01:00" level=warning msg="AppArmor support has been disabled because of lack of kernel support"
time="2024-03-14T08:22:12+01:00" level=warning msg=" - AppArmor support has been disabled, Disabled because of lack of kernel support"
time="2024-03-14T08:56:18+01:00" level=warning msg="IPv4 bridge netfilter not enabled. Instances using the bridge will not be able to connect to the forward listen IPs" driver=bridge err="br_netfilter kernel module not loaded" network=incubr0 project=default
time="2024-03-14T08:56:31+01:00" level=warning msg="IPv4 bridge netfilter not enabled. Instances using the bridge will not be able to connect to the forward listen IPs" driver=bridge err="br_netfilter kernel module not loaded" network=incubr0 project=default
time="2024-03-14T09:00:49+01:00" level=warning msg="IPv4 bridge netfilter not enabled. Instances using the bridge will not be able to connect to the forward listen IPs" driver=bridge err="br_netfilter kernel module not loaded" network=incubr0 project=default
time="2024-03-14T09:09:06+01:00" level=warning msg="The backing filesystem doesn't support quotas, skipping set quota" driver=dir path=/var/lib/incus/storage-pools/hdd/containers/ubuntu pool=hdd size=12000000000 volID=1
stgraber commented 6 months ago
stgraber@castiana:~$ incus launch images:ubuntu/22.04 u1
Launching u1
stgraber@castiana:~$ incus config show --expanded u1
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20240313_07:42)
  image.os: Ubuntu
  image.release: jammy
  image.serial: "20240313_07:42"
  image.type: squashfs
  image.variant: default
  volatile.base_image: c507eb16939310bcb78fbaf9af3bf8668535ec112e2c3012964f7f04dc31773f
  volatile.cloud-init.instance-id: 8db80f25-edfb-45bd-a1e6-d32192c3bddf
  volatile.eth0.host_name: vethe7254546
  volatile.eth0.hwaddr: 00:16:3e:ce:9e:83
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 75ee8fe3-0b90-4a94-a8ec-453357bd5496
  volatile.uuid.generation: 75ee8fe3-0b90-4a94-a8ec-453357bd5496
devices:
  eth0:
    name: eth0
    network: incusbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
stgraber@castiana:~$ incus exec u1 bash
root@u1:~# grep /sys/devices/system/cpu/cpu*/online
root@u1:~# grep "" /sys/devices/system/cpu/cpu*/online
/sys/devices/system/cpu/cpu1/online:1
/sys/devices/system/cpu/cpu10/online:1
/sys/devices/system/cpu/cpu11/online:1
/sys/devices/system/cpu/cpu2/online:1
/sys/devices/system/cpu/cpu3/online:1
/sys/devices/system/cpu/cpu4/online:1
/sys/devices/system/cpu/cpu5/online:1
/sys/devices/system/cpu/cpu6/online:1
/sys/devices/system/cpu/cpu7/online:1
/sys/devices/system/cpu/cpu8/online:1
/sys/devices/system/cpu/cpu9/online:1
root@u1:~# for i in /sys/devices/system/cpu/cpu*/online; do echo 0 > $i; done
bash: /sys/devices/system/cpu/cpu1/online: Permission denied
bash: /sys/devices/system/cpu/cpu10/online: Permission denied
bash: /sys/devices/system/cpu/cpu11/online: Permission denied
bash: /sys/devices/system/cpu/cpu2/online: Permission denied
bash: /sys/devices/system/cpu/cpu3/online: Permission denied
bash: /sys/devices/system/cpu/cpu4/online: Permission denied
bash: /sys/devices/system/cpu/cpu5/online: Permission denied
bash: /sys/devices/system/cpu/cpu6/online: Permission denied
bash: /sys/devices/system/cpu/cpu7/online: Permission denied
bash: /sys/devices/system/cpu/cpu8/online: Permission denied
bash: /sys/devices/system/cpu/cpu9/online: Permission denied
root@u1:~# 
stgraber commented 6 months ago

Can you show:

All from within your container.

niansa commented 6 months ago
nils@illia:~$ sudo incus launch images:ubuntu/22.04 u1 --profile restricted
Launching u1
nils@illia:~$ sudo incus config show --expanded u1
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20240313_07:42)
  image.os: Ubuntu
  image.release: jammy
  image.serial: "20240313_07:42"
  image.type: squashfs
  image.variant: default
  limits.cpu: 10-17
  limits.cpu.priority: "3"
  limits.memory: 8GiB
  limits.processes: "5120"
  volatile.base_image: c507eb16939310bcb78fbaf9af3bf8668535ec112e2c3012964f7f04dc31773f
  volatile.cloud-init.instance-id: 97409100-afcb-498a-9ab7-3239af526911
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.incubr0.host_name: vethd6e94cbc
  volatile.incubr0.hwaddr: 00:16:3e:bf:14:88
  volatile.incubr0.name: eth0
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: cc8df9ed-e715-493e-9ef1-392734838bb6
  volatile.uuid.generation: cc8df9ed-e715-493e-9ef1-392734838bb6
devices:
  incubr0:
    limits.egress: 500Mbit
    limits.ingress: 500Mbit
    nictype: bridged
    parent: incubr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- restricted
stateful: false
description: ""
nils@illia:~$ su oincus exec u1 bash
su: user oincus does not exist or the user entry does not contain all the required fields
nils@illia:~$ sudo incus exec u1 bash
root@u1:~# grep "" /sys/devices/system/cpu/cpu*/online
/sys/devices/system/cpu/cpu1/online:1
/sys/devices/system/cpu/cpu10/online:1
/sys/devices/system/cpu/cpu11/online:1
/sys/devices/system/cpu/cpu12/online:1
/sys/devices/system/cpu/cpu13/online:1
/sys/devices/system/cpu/cpu14/online:1
/sys/devices/system/cpu/cpu15/online:1
/sys/devices/system/cpu/cpu16/online:1
/sys/devices/system/cpu/cpu17/online:1
/sys/devices/system/cpu/cpu18/online:1
/sys/devices/system/cpu/cpu19/online:1
/sys/devices/system/cpu/cpu2/online:1
/sys/devices/system/cpu/cpu3/online:1
/sys/devices/system/cpu/cpu4/online:1
/sys/devices/system/cpu/cpu5/online:1
/sys/devices/system/cpu/cpu6/online:1
/sys/devices/system/cpu/cpu7/online:1
/sys/devices/system/cpu/cpu8/online:1
/sys/devices/system/cpu/cpu9/online:1
root@u1:~# for i in /sys/devices/system/cpu/cpu*/online; do echo 0 > $i; done
root@u1:~# grep "" /sys/devices/system/cpu/cpu*/online
/sys/devices/system/cpu/cpu1/online:0
/sys/devices/system/cpu/cpu10/online:0
/sys/devices/system/cpu/cpu11/online:0
/sys/devices/system/cpu/cpu12/online:0
/sys/devices/system/cpu/cpu13/online:0
/sys/devices/system/cpu/cpu14/online:0
/sys/devices/system/cpu/cpu15/online:0
/sys/devices/system/cpu/cpu16/online:0
/sys/devices/system/cpu/cpu17/online:0
/sys/devices/system/cpu/cpu18/online:0
/sys/devices/system/cpu/cpu19/online:0
/sys/devices/system/cpu/cpu2/online:0
/sys/devices/system/cpu/cpu3/online:0
/sys/devices/system/cpu/cpu4/online:0
/sys/devices/system/cpu/cpu5/online:0
/sys/devices/system/cpu/cpu6/online:0
/sys/devices/system/cpu/cpu7/online:0
/sys/devices/system/cpu/cpu8/online:0
/sys/devices/system/cpu/cpu9/online:0
root@u1:~# cat /proc/self/uid_map
         0    1000000 1000000000
root@u1:~# cat /proc/self/gid_map
         0    1000000 1000000000
root@u1:~# cat /proc/self/mountinfo
1193 923 9:127 /var/lib/incus/storage-pools/default/containers/u1/rootfs / rw,relatime,idmapped shared:402 master:1 - ext4 /dev/md127 rw,seclabel
1194 1193 0:75 / /dev rw,relatime shared:441 - tmpfs none rw,seclabel,size=492k,mode=755,uid=1000000,gid=1000000,inode64
1195 1193 0:76 / /proc rw,nosuid,nodev,noexec,relatime shared:618 - proc proc rw
1196 1193 0:77 / /sys rw,relatime shared:628 - sysfs sysfs rw,seclabel
1197 1194 0:5 /fuse /dev/fuse rw,nosuid,relatime shared:484 master:2 - devtmpfs udev rw,seclabel,size=32795920k,nr_inodes=8198980,mode=755,inode64
1198 1194 0:5 /net/tun /dev/net/tun rw,nosuid,relatime shared:513 master:2 - devtmpfs udev rw,seclabel,size=32795920k,nr_inodes=8198980,mode=755,inode64
1199 1196 0:28 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime shared:629 master:12 - efivarfs efivarfs rw
1200 1196 0:32 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:630 master:20 - fusectl fusectl rw
1201 1196 0:27 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:631 master:11 - pstore pstore rw,seclabel
1202 1196 0:33 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime shared:632 master:21 - configfs configfs rw
1203 1196 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:633 master:18 - debugfs debugfs rw,seclabel
1204 1196 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:634 master:8 - securityfs securityfs rw
1205 1196 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:635 master:19 - tracefs tracefs rw,seclabel
1206 1195 0:45 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime shared:619 master:66 - binfmt_misc binfmt_misc rw
1207 1194 0:18 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:572 master:17 - mqueue mqueue rw,seclabel
1208 1194 0:62 / /dev/incus rw,relatime shared:607 master:362 - tmpfs tmpfs rw,seclabel,size=100k,mode=755,inode64
1209 1194 0:61 /u1 /dev/.incus-mounts rw,relatime master:354 - tmpfs tmpfs rw,seclabel,size=100k,mode=711,inode64
1210 1196 0:26 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:636 - cgroup2 none rw,seclabel
1211 1195 0:60 /proc/cpuinfo /proc/cpuinfo rw,nosuid,nodev,relatime shared:620 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1212 1195 0:60 /proc/diskstats /proc/diskstats rw,nosuid,nodev,relatime shared:621 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1213 1195 0:60 /proc/loadavg /proc/loadavg rw,nosuid,nodev,relatime shared:622 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1214 1195 0:60 /proc/meminfo /proc/meminfo rw,nosuid,nodev,relatime shared:623 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1215 1195 0:60 /proc/slabinfo /proc/slabinfo rw,nosuid,nodev,relatime shared:624 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1216 1195 0:60 /proc/stat /proc/stat rw,nosuid,nodev,relatime shared:625 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1217 1195 0:60 /proc/swaps /proc/swaps rw,nosuid,nodev,relatime shared:626 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1218 1195 0:60 /proc/uptime /proc/uptime rw,nosuid,nodev,relatime shared:627 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1219 1196 0:60 /sys/devices/system/cpu /sys/devices/system/cpu rw,nosuid,nodev,relatime shared:637 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1220 1194 0:5 /full /dev/full rw,nosuid,relatime shared:609 master:2 - devtmpfs udev rw,seclabel,size=32795920k,nr_inodes=8198980,mode=755,inode64
1221 1194 0:5 /null /dev/null rw,nosuid,relatime shared:610 master:2 - devtmpfs udev rw,seclabel,size=32795920k,nr_inodes=8198980,mode=755,inode64
1222 1194 0:5 /random /dev/random rw,nosuid,relatime shared:611 master:2 - devtmpfs udev rw,seclabel,size=32795920k,nr_inodes=8198980,mode=755,inode64
1223 1194 0:5 /tty /dev/tty rw,nosuid,relatime shared:612 master:2 - devtmpfs udev rw,seclabel,size=32795920k,nr_inodes=8198980,mode=755,inode64
1224 1194 0:5 /urandom /dev/urandom rw,nosuid,relatime shared:613 master:2 - devtmpfs udev rw,seclabel,size=32795920k,nr_inodes=8198980,mode=755,inode64
1225 1194 0:5 /zero /dev/zero rw,nosuid,relatime shared:614 master:2 - devtmpfs udev rw,seclabel,size=32795920k,nr_inodes=8198980,mode=755,inode64
1226 1194 0:78 / /dev/pts rw,nosuid,noexec,relatime shared:615 - devpts devpts rw,seclabel,gid=1000005,mode=620,ptmxmode=666,max=1024
1227 1194 0:78 /ptmx /dev/ptmx rw,nosuid,noexec,relatime shared:616 - devpts devpts rw,seclabel,gid=1000005,mode=620,ptmxmode=666,max=1024
1228 1194 0:78 /0 /dev/console rw,nosuid,noexec,relatime shared:617 - devpts devpts rw,seclabel,gid=1000005,mode=620,ptmxmode=666,max=1024
1229 1195 0:75 /.lxc-boot-id /proc/sys/kernel/random/boot_id ro,nosuid,nodev,noexec,relatime shared:441 - tmpfs none rw,seclabel,size=492k,mode=755,uid=1000000,gid=1000000,inode64
924 1194 0:79 / /dev/shm rw,nosuid,nodev shared:608 - tmpfs tmpfs rw,seclabel,uid=1000000,gid=1000000,inode64
925 1193 0:80 / /run rw,nosuid,nodev shared:638 - tmpfs tmpfs rw,seclabel,size=13128852k,nr_inodes=819200,mode=755,uid=1000000,gid=1000000,inode64
926 925 0:81 / /run/lock rw,nosuid,nodev,noexec,relatime shared:639 - tmpfs tmpfs rw,seclabel,size=5120k,uid=1000000,gid=1000000,inode64
root@u1:~# exit
nils@illia:~$ sudo incus profile show restricted
config:
  limits.cpu: 10-17
  limits.cpu.priority: "3"
  limits.memory: 8GiB
  limits.processes: "5120"
description: Default Incus profile
devices:
  incubr0:
    limits.egress: 500Mbit
    limits.ingress: 500Mbit
    nictype: bridged
    parent: incubr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: restricted
used_by:
- /1.0/instances/ubuntu
- /1.0/instances/u1
niansa commented 6 months ago

I feel like it might be related to the limits.cpu: 10-17 or the 1219 1196 0:60 /sys/devices/system/cpu /sys/devices/system/cpu rw,nosuid,nodev,relatime shared:637 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other mount entry

Edit: Unsetting limits.cpu does not seem to fix this issue

stgraber commented 6 months ago

I'm going to check on the Debian kernel now.

stgraber commented 6 months ago
root@d12:~# incus admin init --auto
root@d12:~# incus launch images:ubuntu/22.04 u1
Launching u1
root@d12:~# incus exec u1 bash                
root@u1:~# cat /proc/self/mountinfo 
361 268 8:2 /var/lib/incus/storage-pools/default/containers/u1/rootfs / rw,relatime,idmapped shared:143 master:1 - ext4 /dev/sda2 rw
362 361 0:49 / /dev rw,relatime shared:144 - tmpfs none rw,size=492k,mode=755,uid=1000000,gid=1000000,inode64
363 361 0:50 / /proc rw,nosuid,nodev,noexec,relatime shared:197 - proc proc rw
364 361 0:51 / /sys rw,relatime shared:207 - sysfs sysfs rw
365 362 0:5 /fuse /dev/fuse rw,nosuid,relatime shared:145 master:2 - devtmpfs udev rw,size=1982000k,nr_inodes=495500,mode=755,inode64
366 362 0:5 /net/tun /dev/net/tun rw,nosuid,relatime shared:146 master:2 - devtmpfs udev rw,size=1982000k,nr_inodes=495500,mode=755,inode64
367 364 0:28 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime shared:208 master:11 - efivarfs efivarfs rw
368 364 0:33 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:209 master:20 - fusectl fusectl rw
369 364 0:27 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:210 master:10 - pstore pstore rw
370 364 0:32 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime shared:211 master:19 - configfs configfs rw
371 364 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:212 master:15 - debugfs debugfs rw
372 364 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:213 master:8 - securityfs securityfs rw
373 364 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:214 master:18 - tracefs tracefs rw
381 363 0:52 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime shared:198 master:141 - binfmt_misc binfmt_misc rw
382 362 0:19 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:147 master:17 - mqueue mqueue rw
383 362 0:47 / /dev/incus rw,relatime shared:148 master:136 - tmpfs tmpfs rw,size=100k,mode=755,inode64
384 362 0:46 /u1 /dev/.incus-mounts rw,relatime master:130 - tmpfs tmpfs rw,size=100k,mode=711,inode64
385 364 0:26 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:215 - cgroup2 none rw
386 363 0:45 /proc/cpuinfo /proc/cpuinfo rw,nosuid,nodev,relatime shared:199 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
387 363 0:45 /proc/diskstats /proc/diskstats rw,nosuid,nodev,relatime shared:200 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
388 363 0:45 /proc/loadavg /proc/loadavg rw,nosuid,nodev,relatime shared:201 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
389 363 0:45 /proc/meminfo /proc/meminfo rw,nosuid,nodev,relatime shared:202 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
390 363 0:45 /proc/slabinfo /proc/slabinfo rw,nosuid,nodev,relatime shared:203 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
391 363 0:45 /proc/stat /proc/stat rw,nosuid,nodev,relatime shared:204 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
392 363 0:45 /proc/swaps /proc/swaps rw,nosuid,nodev,relatime shared:205 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
393 363 0:45 /proc/uptime /proc/uptime rw,nosuid,nodev,relatime shared:206 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
394 364 0:45 /sys/devices/system/cpu /sys/devices/system/cpu rw,nosuid,nodev,relatime shared:216 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
395 362 0:5 /full /dev/full rw,nosuid,relatime shared:150 master:2 - devtmpfs udev rw,size=1982000k,nr_inodes=495500,mode=755,inode64
396 362 0:5 /null /dev/null rw,nosuid,relatime shared:151 master:2 - devtmpfs udev rw,size=1982000k,nr_inodes=495500,mode=755,inode64
397 362 0:5 /random /dev/random rw,nosuid,relatime shared:152 master:2 - devtmpfs udev rw,size=1982000k,nr_inodes=495500,mode=755,inode64
398 362 0:5 /tty /dev/tty rw,nosuid,relatime shared:163 master:2 - devtmpfs udev rw,size=1982000k,nr_inodes=495500,mode=755,inode64
399 362 0:5 /urandom /dev/urandom rw,nosuid,relatime shared:192 master:2 - devtmpfs udev rw,size=1982000k,nr_inodes=495500,mode=755,inode64
400 362 0:5 /zero /dev/zero rw,nosuid,relatime shared:193 master:2 - devtmpfs udev rw,size=1982000k,nr_inodes=495500,mode=755,inode64
401 362 0:53 / /dev/pts rw,nosuid,noexec,relatime shared:194 - devpts devpts rw,gid=1000005,mode=620,ptmxmode=666,max=1024
402 362 0:53 /ptmx /dev/ptmx rw,nosuid,noexec,relatime shared:195 - devpts devpts rw,gid=1000005,mode=620,ptmxmode=666,max=1024
403 362 0:53 /0 /dev/console rw,nosuid,noexec,relatime shared:196 - devpts devpts rw,gid=1000005,mode=620,ptmxmode=666,max=1024
375 363 0:49 /.lxc-boot-id /proc/sys/kernel/random/boot_id ro,nosuid,nodev,noexec,relatime shared:144 - tmpfs none rw,size=492k,mode=755,uid=1000000,gid=1000000,inode64
269 362 0:54 / /dev/shm rw,nosuid,nodev shared:149 - tmpfs tmpfs rw,uid=1000000,gid=1000000,inode64
270 361 0:55 / /run rw,nosuid,nodev shared:217 - tmpfs tmpfs rw,size=800328k,nr_inodes=819200,mode=755,uid=1000000,gid=1000000,inode64
271 270 0:56 / /run/lock rw,nosuid,nodev,noexec,relatime shared:218 - tmpfs tmpfs rw,size=5120k,uid=1000000,gid=1000000,inode64
root@u1:~# grep "" /sys/devices/system/cpu/cpu*/online
/sys/devices/system/cpu/cpu1/online:1
/sys/devices/system/cpu/cpu2/online:1
/sys/devices/system/cpu/cpu3/online:1
root@u1:~# echo 0 > /sys/devices/system/cpu/cpu1/online 
bash: /sys/devices/system/cpu/cpu1/online: Permission denied
root@u1:~# echo 0 > /sys/devices/system/cpu/cpu2/online 
bash: /sys/devices/system/cpu/cpu2/online: Permission denied
root@u1:~# echo 0 > /sys/devices/system/cpu/cpu3/online 
bash: /sys/devices/system/cpu/cpu3/online: Permission denied
root@u1:~# grep cpu /proc/self/mountinfo
386 363 0:45 /proc/cpuinfo /proc/cpuinfo rw,nosuid,nodev,relatime shared:199 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
394 364 0:45 /sys/devices/system/cpu /sys/devices/system/cpu rw,nosuid,nodev,relatime shared:216 master:125 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
root@u1:~# 
exit
root@d12:~# uname -a
Linux d12 6.1.0-18-amd64 lxc/incus#1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
root@d12:~# 

No such luck here.

stgraber commented 6 months ago

I also replicated your limits for good measure, still no luck with it.

niansa commented 6 months ago

I wonder if it's related to my previous upgrade from Debian 11 to 12. Perhaps there are still old configs from Debian 11 causing issues here

stgraber commented 6 months ago

What packages are you using for Incus? I've tried reproducing things here using the Zabbly stable ones.

stgraber commented 6 months ago

In general what's really really weird is that LXCFS which handles /sys/devices/cpu doesn't handle write requests, at least it doesn't on my system.

stgraber commented 6 months ago

So when tracing lxcfs while writing into the CPU online file, I just see it processing a read request on the file and never even attempt a write, instead just returning the permission error immediately to the user.

niansa commented 6 months ago

What packages are you using for Incus? I've tried reproducing things here using the Zabbly stable ones.

Same here.

So when tracing lxcfs while writing into the CPU online file, I just see it processing a read request on the file and never even attempt a write

Let me attempt this on my system real quick!

niansa commented 6 months ago

Okay here is something really weird, this is an strace of lxcfs when taking CPU 9 offline from within the VM:

[pid 3668986] <... read resumed>"-\0\0\0\1\0\0\0B3\1\0\0\0\0\0\16\0\0\0\0\0\0\0@B\17\0@B\17\0"..., 1052672) = 45
[pid 3668986] newfstatat(AT_FDCWD, "/sys/devices/system/cpu/cpu9", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 3668986] writev(7, [{iov_base="\220\0\0\0\0\0\0\0B3\1\0\0\0\0\0", iov_len=16}, {iov_base="\22\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=128}], 2) = 144
[pid 3536941] <... read resumed>"/\0\0\0\1\0\0\0D3\1\0\0\0\0\0\22\0\0\0\0\0\0\0@B\17\0@B\17\0"..., 1052672) = 47
[pid 3536941] newfstatat(AT_FDCWD, "/sys/devices/system/cpu/cpu9/online", {st_mode=S_IFREG|0644, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 3668986] read(7,  <unfinished ...>
[pid 3536941] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu9/online", O_RDONLY|O_CLOEXEC) = 8
[pid 3536941] newfstatat(8, "", {st_mode=S_IFREG|0644, st_size=4096, ...}, AT_EMPTY_PATH) = 0
[pid 3536941] read(8, "0\n", 4096)      = 2
[pid 3536941] read(8, "", 4096)         = 0
[pid 3536941] close(8)                  = 0
[pid 3536941] writev(7, [{iov_base="\220\0\0\0\0\0\0\0D3\1\0\0\0\0\0", iov_len=16}, {iov_base="J\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=128}], 2) = 144
[pid 3536941] read(7,  <unfinished ...>
[pid 3528136] <... read resumed>"0\0\0\0\16\0\0\0F3\1\0\0\0\0\0J\0\0\0\0\0\0\0@B\17\0@B\17\0"..., 1052672) = 48
[pid 3528136] newfstatat(AT_FDCWD, "/sys/devices/system/cpu/cpu9/online", {st_mode=S_IFREG|0644, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 3528136] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu9/online", O_RDONLY|O_CLOEXEC) = 8
[pid 3528136] newfstatat(8, "", {st_mode=S_IFREG|0644, st_size=4096, ...}, AT_EMPTY_PATH) = 0
[pid 3528136] read(8, "0\n", 4096)      = 2
[pid 3528136] read(8, "", 4096)         = 0
[pid 3528136] close(8)                  = 0
[pid 3528136] writev(7, [{iov_base=" \0\0\0\0\0\0\0F3\1\0\0\0\0\0", iov_len=16}, {iov_base="\0\36\0X\326\177\0\0\1\0\0\0\0\0\0\0", iov_len=16}], 2) = 32
[pid 3528136] read(7,  <unfinished ...>
[pid 3536942] <... read resumed>"@\0\0\0\31\0\0\0H3\1\0\0\0\0\0J\0\0\0\0\0\0\0@B\17\0@B\17\0"..., 1052672) = 64
[pid 3536942] writev(7, [{iov_base="\20\0\0\0\0\0\0\0H3\1\0\0\0\0\0", iov_len=16}], 1) = 16
[pid 3536942] read(7,  <unfinished ...>
[pid 3668987] <... read resumed>"R\0\0\0\20\0\0\0J3\1\0\0\0\0\0J\0\0\0\0\0\0\0@B\17\0@B\17\0"..., 1052672) = 82
[pid 3668987] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu9/online", O_WRONLY|O_CLOEXEC) = 8
[pid 3668987] pwrite64(8, "0\n", 2, 0)  = 2
[pid 3668987] close(8)                  = 0
[pid 3668987] writev(7, [{iov_base="\30\0\0\0\0\0\0\0J3\1\0\0\0\0\0", iov_len=16}, {iov_base="\2\0\0\0\0\0\0\0", iov_len=8}], 2) = 24
[pid 3668987] read(7,  <unfinished ...>
[pid 3528135] <... read resumed>"@\0\0\0\31\0\0\0L3\1\0\0\0\0\0J\0\0\0\0\0\0\0@B\17\0@B\17\0"..., 1052672) = 64
[pid 3528135] writev(7, [{iov_base="\20\0\0\0\0\0\0\0L3\1\0\0\0\0\0", iov_len=16}], 1) = 16
[pid 3528135] read(7,  <unfinished ...>
[pid 3710923] <... read resumed>"@\0\0\0\22\0\0\0N3\1\0\0\0\0\0J\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1052672) = 64
[pid 3710923] writev(7, [{iov_base="\20\0\0\0\0\0\0\0N3\1\0\0\0\0\0", iov_len=16}], 1) = 16
[pid 3710923] read(7, 

It's just passing the request though?

stgraber commented 6 months ago

Yeah, sure looks like it, so why isn't it doing that here then...

stgraber commented 6 months ago

@niansa I've relocated the issue to lxcfs so we can have the right folks looking into this. I've also privately pinged a few of the usual suspects.

Could you confirm the exact arguments for lxcfs as seen in ps faux | grep lxcfs?

niansa commented 6 months ago
nils@illia:~$ sudo ps faux | grep lxcfs
root     3528133  0.0  0.0 527208  7320 ?        Ssl  08:22   0:01 /opt/incus/bin/lxcfs /var/lib/incus-lxcfs
stgraber commented 6 months ago

Okay, so same thing I have... Very very confused as to why I'm not hitting this code path here.

niansa commented 6 months ago

@stgraber I'd agree to granting you supervised access to the host via tmate, if you'd like. Maybe it helps you to get a better insight.

stgraber commented 6 months ago

@niansa I bet I know the difference, do you happen to have a system without apparmor?

niansa commented 6 months ago

Huh, yeah, looks pretty disabled:

nils@illia:~$ cat /sys/module/apparmor/parameters/enabled
N
nils@illia:~$ sudo aa-status
[sudo] password for nils: 
apparmor module is loaded.
apparmor filesystem is not mounted.

No idea why though.

stgraber commented 6 months ago

Right, so that's why it wouldn't work for me. Our default apparmor profile prevents it completely as an extra safety net.

@mihalicyn is looking at a fix for this, basically removing any kind of write support from lxcfs as that should never have been allowed to begin with.

We have LXCFS 6.0 scheduled for release next week so we're not going to rush a bugfix release of 5.0.x right now for this, but I will be cherry-picking the fix directly into the Zabbly packages so that you can get it that way at least.

niansa commented 6 months ago

Thank you!

stgraber commented 6 months ago

Closing as the fix as been merged now.