zabbly / incus

Incus package repository
Apache License 2.0
231 stars 18 forks source link

Incus isn't correctly merging large uid/gid ranges together #55

Open dontlaugh opened 2 months ago

dontlaugh commented 2 months ago

Full disclosure, I don't recall editing either of these files for a long time. Probably the last time I touched them on this machine was years ago when setting up LXD.

In the time since, I have run incus migration scripts and everything's been fine. Today containers failed to start. I rebooted my computer and here is my entire shell session since then. It shows me trying to start some containers and checking the logs, which pointed me towards /etc/subuid and /etc/subgid

shell session debugging starting containers ``` coleman@augustus /home/coleman 0 % df -h Filesystem Size Used Avail Use% Mounted on tmpfs 3.2G 2.3M 3.2G 1% /run /dev/nvme0n1p3 462G 283G 176G 62% / tmpfs 16G 280K 16G 1% /dev/shm tmpfs 5.0M 16K 5.0M 1% /run/lock efivarfs 128K 21K 103K 17% /sys/firmware/efi/efivars /dev/nvme0n1p1 487M 448M 39M 93% /boot/efi /dev/nvme0n1p3 462G 283G 176G 62% /home tmpfs 3.2G 160K 3.2G 1% /run/user/1000 tmpfs 100K 0 100K 0% /var/lib/incus/shmounts tmpfs 100K 0 100K 0% /var/lib/incus/guestapi /dev/nvme0n1p3 462G 283G 176G 62% /var/lib/incus/storage-pools/default coleman@augustus /home/coleman 0 % i ls +---------+---------+------+------+-----------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +---------+---------+------+------+-----------+-----------+ | stopper | STOPPED | | | CONTAINER | 0 | +---------+---------+------+------+-----------+-----------+ coleman@augustus /home/coleman 0 % i start stopper Error: Failed to run: /opt/incus/bin/incusd forkstart stopper /var/lib/incus/containers /run/incus/stopper/lxc.conf: exit status 1 Try `incus info --show-log stopper` for more info coleman@augustus /home/coleman 1 % incus info --show-log stopper Name: stopper Status: STOPPED Type: container Architecture: x86_64 Created: 2024/09/07 18:35 EDT Last Used: 2024/09/07 18:40 EDT Log: lxc stopper 20240907224024.186 ERROR idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:245 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 3857 0 1000000 1000000000 0 1001000000 1000000000 lxc stopper 20240907224024.186 ERROR start - ../src/lxc/start.c:lxc_spawn:1795 - Failed to set up id mapping. lxc stopper 20240907224024.186 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING" lxc stopper 20240907224024.187 ERROR start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "stopper" lxc stopper 20240907224024.187 WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 3857 lxc 20240907224024.227 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response lxc 20240907224024.227 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid" coleman@augustus /home/coleman 0 % i launch images:ubuntu/22.04 Launching the instance Error: Failed instance creation: Failed to run: /opt/incus/bin/incusd forkstart able-weasel /var/lib/incus/containers /run/incus/able-weasel/lxc.conf: exit status 1 coleman@augustus /home/coleman 1 % incus info --show-log able-weasel Name: able-weasel Status: STOPPED Type: container Architecture: x86_64 Created: 2024/09/07 18:43 EDT Last Used: 2024/09/07 18:43 EDT Log: lxc able-weasel 20240907224348.346 ERROR idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:245 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 5083 0 1000000 1000000000 0 1001000000 1000000000 lxc able-weasel 20240907224348.346 ERROR start - ../src/lxc/start.c:lxc_spawn:1795 - Failed to set up id mapping. lxc able-weasel 20240907224348.346 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING" lxc able-weasel 20240907224348.347 ERROR start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "able-weasel" lxc able-weasel 20240907224348.347 WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 5083 lxc 20240907224348.387 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response lxc 20240907224348.387 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid" coleman@augustus /home/coleman 0 % groups coleman adm cdrom sudo dip plugdev lpadmin lxd sambashare incus-admin coleman@augustus /home/coleman 0 % sudo usermod -aG incus coleman [sudo] password for coleman: coleman@augustus /home/coleman 0 % cat /etc/subuid coleman:100000:65536 root:1000000:1000000000 root:1001000000:1000000000 coleman@augustus /home/coleman 0 % cat /etc/subgid coleman:100000:65536 root:1000000:1000000000 root:1001000000:1000000000 coleman@augustus /home/coleman 0 % sudo kak /etc/subgid coleman@augustus /home/coleman 0 % sudo kak /etc/subuid coleman@augustus /home/coleman 0 % i launch images:ubuntu/22.04 maptest Launching maptest Error: Failed instance creation: Failed to run: /opt/incus/bin/incusd forkstart maptest /var/lib/incus/containers /run/incus/maptest/lxc.conf: exit status 1 coleman@augustus /home/coleman 1 % incus info --show-log maptest Name: maptest Status: STOPPED Type: container Architecture: x86_64 Created: 2024/09/07 18:53 EDT Last Used: 2024/09/07 18:53 EDT Log: lxc maptest 20240907225346.676 ERROR idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:245 - newuidmap failed to write mapping "newuidmap: uid range [0-1000000000) -> [1001000000-2001000000) not allowed": newuidmap 5832 0 1000000 1000000000 0 1001000000 1000000000 lxc maptest 20240907225346.676 ERROR start - ../src/lxc/start.c:lxc_spawn:1795 - Failed to set up id mapping. lxc maptest 20240907225346.676 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING" lxc maptest 20240907225346.676 ERROR start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "maptest" lxc maptest 20240907225346.676 WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 5832 lxc 20240907225346.710 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response lxc 20240907225346.710 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid" coleman@augustus /home/coleman 0 % sudo systemctl restart incus coleman@augustus /home/coleman 0 % incus info --show-log maptest2 Error: Failed to fetch instance "maptest2" in project "default": Instance not found coleman@augustus /home/coleman 1 % i launch images:ubuntu/22.04 maptest2 Launching maptest2 ```

I got containers to start by removing the second entry for root in /etc/subuid and /etc/subgid

% cat /etc/subgid
coleman:100000:65536
root:1000000:1000000000
root:1001000000:1000000000  # deleted this

% cat /etc/subgid
coleman:100000:65536
root:1000000:1000000000
root:1001000000:1000000000  # and deleted this, too

I was running Zabbly stable, and then naively (instead of looking at logs like I did here after reboot), I upgraded to daily, which I'd been meaning to do anyway.

Is there any chance that this package manipulates /etc/subuid or /etc/subgid? The usermod --add-subiuds does look suspicious here. Is there an off-by-one error in the script? https://github.com/zabbly/incus/blob/fb7f1890dc79c3b4448a7bad3dcd489458e59633/debian/incus-base.postinst#L46-L47

Is this multiple-range setup even a valid config?

dontlaugh commented 2 months ago

Ahh, check this out. I added 1 to the start of the second range for root, and incus faithfully started a container.

coleman@augustus /home/coleman 
0 % cat /etc/subgid
coleman:100000:65536
root:1000000:1000000000
root:1001000001:1000000000

coleman@augustus /home/coleman 
0 % cat /etc/subuid
coleman:100000:65536
root:1000000:1000000000
root:1001000001:1000000000
stgraber commented 2 months ago
root:1000000:1000000000
root:1001000000:1000000000
lxc able-weasel 20240907224348.346 ERROR    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:245 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 5083 0 1000000 1000000000 0 1001000000 1000000000

That's interesting, so here we can see Incus / LXC attempting to set up two maps starting at uid 0. That obviously isn't going to work.

Instead I'd have expected to either have just the first map be picked (old LXD behavior) or the two maps be merged together, leading to 2000000000 uid/gid for that instance.