lxc / incus

Powerful system container and virtual machine manager
https://linuxcontainers.org/incus
Apache License 2.0
2.74k stars 224 forks source link

All Incus Containers fail to start (again) on Void Linux + do not get an IP #1259

Closed acidvegas closed 1 month ago

acidvegas commented 1 month ago

Currently running Void Linux. There was an issue a few versions back with CGROUPS on Void Linux expecting systemd, which a solution was given to me by @stgraber to run the following:

mkdir /sys/fs/cgroup/systemd
mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd

This was placed in my /etc/rc.local file to run on startup, along with CGROUP='unified', which fixed the issues I was PREVIOUSLY having with Incus.

But NOW, after updating my system, all of my Inucs containers on all of my servers are now failing to start, leaving all of my services completely down and inaccessible once again.

Anytime I try to start a container I get:

[services@r320-1 ~]$ incus start elasticsearch-container
Error: Failed to run: /usr/libexec/incus/incusd forkstart elasticsearch-container /var/lib/incus/containers /run/incus/elasticsearch-container/lxc.conf: exit status 1
Try `incus info --show-log elasticsearch-container` for more info

I should mention, the only machines this is affecting are machines I previous did lxd-to-incus on. I have a server that only was setup with incus, and never had to do the lxd to incus migration, and that one starts my containers perfectly fine.

So this is most likely related to leftover shit from LXD, maybe the interface name?

Here are some commands I have included the output of to try and help debug:

08:53:47 brandon@r320-1 ~ : uname -a
Linux r320-1 6.6.52_1 #1 SMP PREEMPT_DYNAMIC Sat Sep 21 15:47:36 UTC 2024 x86_64 GNU/Linux
08:54:08 brandon@r320-1 ~ : incus --version
6.5
08:48:07 root@r320-1 ~ : cat /etc/subuid
brandon:100000:65536
services:165536:65536
root:1000000:65536
monroe:231072:65536
08:48:12 root@r320-1 ~ : cat /etc/subgid
brandon:100000:65536
services:165536:65536
root:1000000:65536
monroe:231072:65536
08:48:39 services@r320-1 ~ : incus info --show-log elasticsearch-container
Name: elasticsearch-container
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/02/29 15:06 EST
Last Used: 2024/09/26 20:48 EDT
Log:
lxc elasticsearch-container 20240927004848.490 INFO     lxccontainer - ../src/lxc/lxccontainer.c:do_lxcapi_start:997 - Set process title to [lxc monitor] /var/lib/incus/containers elasticsearch-container
lxc elasticsearch-container 20240927004848.491 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 4
lxc elasticsearch-container 20240927004848.491 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 5
lxc elasticsearch-container 20240927004848.491 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 16
lxc elasticsearch-container 20240927004848.491 INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver nop
lxc elasticsearch-container 20240927004848.491 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/proc/1014/exe callhook /var/lib/incus "default" "elasticsearch-container" start" for container "elasticsearch-container"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "[all]"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "[all]"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "kexec_load errno 38"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[246:kexec_load] action[327718:errno] arch[0]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "open_by_handle_at errno 38"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[304:open_by_handle_at] action[327718:errno] arch[0]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "init_module errno 38"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[175:init_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "finit_module errno 38"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[313:finit_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "delete_module errno 38"
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[176:delete_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240927004848.534 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:1017 - Merging compat seccomp contexts into main context
lxc elasticsearch-container 20240927004848.534 INFO     start - ../src/lxc/start.c:lxc_init:881 - Container "elasticsearch-container" is initialized
lxc elasticsearch-container 20240927004848.534 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1383 - The monitor process uses "lxc.monitor.elasticsearch-container" as cgroup
lxc elasticsearch-container 20240927004848.548 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1491 - The container process uses "lxc.payload.elasticsearch-container" as inner and "lxc.payload.elasticsearch-container" as limit cgroup
lxc elasticsearch-container 20240927004848.558 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWUSER
lxc elasticsearch-container 20240927004848.558 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWNS
lxc elasticsearch-container 20240927004848.558 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWPID
lxc elasticsearch-container 20240927004848.558 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWUTS
lxc elasticsearch-container 20240927004848.558 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWIPC
lxc elasticsearch-container 20240927004848.563 INFO     conf - ../src/lxc/conf.c:lxc_map_ids:3603 - Caller maps host root. Writing mapping directly
lxc elasticsearch-container 20240927004848.563 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc elasticsearch-container 20240927004848.564 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:fchowmodat:1611 - No such file or directory - Failed to fchownat(18, memory.oom.group, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc elasticsearch-container 20240927004848.564 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:fchowmodat:1611 - No such file or directory - Failed to fchownat(18, memory.reclaim, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc elasticsearch-container 20240927004848.565 INFO     start - ../src/lxc/start.c:do_start:1104 - Unshared CLONE_NEWNET
lxc elasticsearch-container 20240927004848.565 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc elasticsearch-container 20240927004848.565 NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1344 - Switched to gid 0
lxc elasticsearch-container 20240927004848.565 NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1353 - Switched to uid 0
lxc elasticsearch-container 20240927004848.566 INFO     start - ../src/lxc/start.c:do_start:1204 - Unshared CLONE_NEWCGROUP
lxc elasticsearch-container 20240927004848.591 INFO     conf - ../src/lxc/conf.c:setup_utsname:875 - Set hostname to "elasticsearch-container"
lxc elasticsearch-container 20240927004848.597 INFO     network - ../src/lxc/network.c:lxc_setup_network_in_child_namespaces:4019 - Finished setting up network devices with caller assigned names
lxc elasticsearch-container 20240927004848.597 INFO     conf - ../src/lxc/conf.c:mount_autodev:1219 - Preparing "/dev"
lxc elasticsearch-container 20240927004848.597 INFO     conf - ../src/lxc/conf.c:mount_autodev:1280 - Prepared "/dev"
lxc elasticsearch-container 20240927004848.599 ERROR    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_mount:2131 - No such file or directory - Failed to create cgroup at_mnt 24()
lxc elasticsearch-container 20240927004848.599 ERROR    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:851 - No such file or directory - Failed to mount "/sys/fs/cgroup"
lxc elasticsearch-container 20240927004848.599 ERROR    conf - ../src/lxc/conf.c:lxc_setup:4396 - Failed to setup remaining automatic mounts
lxc elasticsearch-container 20240927004848.599 ERROR    start - ../src/lxc/start.c:do_start:1272 - Failed to setup container "elasticsearch-container"
lxc elasticsearch-container 20240927004848.599 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc elasticsearch-container 20240927004848.606 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from "eth0" to its initial name "veth623c49d3"
lxc elasticsearch-container 20240927004848.606 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "elasticsearch-container"
lxc elasticsearch-container 20240927004848.606 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc elasticsearch-container 20240927004848.606 WARN     start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 19 for process 2219
lxc elasticsearch-container 20240927004848.606 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/usr/libexec/incus/incusd callhook /var/lib/incus "default" "elasticsearch-container" stopns" for container "elasticsearch-container"
lxc elasticsearch-container 20240927004848.699 INFO     conf - ../src/lxc/conf.c:lxc_map_ids:3603 - Caller maps host root. Writing mapping directly
lxc elasticsearch-container 20240927004848.699 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc 20240927004848.715 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240927004848.715 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc elasticsearch-container 20240927004848.715 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/usr/libexec/incus/incusd callhook /var/lib/incus "default" "elasticsearch-container" stop" for container "elasticsearch-container"
08:51:03 brandon@r320-1 ~ : ls -lh /sys/fs/cgroup/
total 0
-r--r--r-- 1 root root 0 Sep 26 20:21 cgroup.controllers
-rw-r--r-- 1 root root 0 Sep 26 20:53 cgroup.max.depth
-rw-r--r-- 1 root root 0 Sep 26 20:53 cgroup.max.descendants
-rw-r--r-- 1 root root 0 Sep 26 20:53 cgroup.pressure
-rw-r--r-- 1 root root 0 Sep 26 20:53 cgroup.procs
-r--r--r-- 1 root root 0 Sep 26 20:53 cgroup.stat
-rw-r--r-- 1 root root 0 Sep 26 20:21 cgroup.subtree_control
-rw-r--r-- 1 root root 0 Sep 26 20:21 cgroup.threads
-rw-r--r-- 1 root root 0 Sep 26 20:53 cpu.pressure
-r--r--r-- 1 root root 0 Sep 26 20:21 cpuset.cpus.effective
-r--r--r-- 1 root root 0 Sep 26 20:53 cpuset.mems.effective
-r--r--r-- 1 root root 0 Sep 26 20:53 cpu.stat
-r--r--r-- 1 root root 0 Sep 26 20:53 cpu.stat.local
-rw-r--r-- 1 root root 0 Sep 26 20:53 io.cost.model
-rw-r--r-- 1 root root 0 Sep 26 20:53 io.cost.qos
-rw-r--r-- 1 root root 0 Sep 26 20:53 io.pressure
-rw-r--r-- 1 root root 0 Sep 26 20:53 io.prio.class
-r--r--r-- 1 root root 0 Sep 26 20:53 io.stat
-rw-r--r-- 1 root root 0 Sep 26 20:53 irq.pressure
drwxr-xr-x 2 root root 0 Sep 26 20:21 lxc.pivot
-r--r--r-- 1 root root 0 Sep 26 20:53 memory.numa_stat
-rw-r--r-- 1 root root 0 Sep 26 20:53 memory.pressure
--w------- 1 root root 0 Sep 26 20:21 memory.reclaim
-r--r--r-- 1 root root 0 Sep 26 20:53 memory.stat
-r--r--r-- 1 root root 0 Sep 26 20:53 misc.capacity
-r--r--r-- 1 root root 0 Sep 26 20:53 misc.current
dr-xr-xr-x 3 root root 0 Sep 26 20:20 systemd
08:53:14 brandon@r320-1 ~ : ls -lh /sys/fs/cgroup/systemd/
total 0
-rw-r--r-- 1 root root 0 Sep 26 20:53 cgroup.clone_children
-rw-r--r-- 1 root root 0 Sep 26 20:53 cgroup.procs
-r--r--r-- 1 root root 0 Sep 26 20:53 cgroup.sane_behavior
drwxr-xr-x 2 root root 0 Sep 26 20:21 lxc.pivot
-rw-r--r-- 1 root root 0 Sep 26 20:53 notify_on_release
-rw-r--r-- 1 root root 0 Sep 26 20:53 release_agent
-rw-r--r-- 1 root root 0 Sep 26 20:53 tasks
08:53:21 brandon@r320-1 ~ : cat /proc/self/mountinfo
21 28 0:20 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
22 28 0:21 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
23 28 0:5 / /dev rw,nosuid,noexec - devtmpfs devtmpfs rw,size=32838940k,nr_inodes=8209735,mode=755,inode64
24 23 0:22 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000
25 23 0:23 / /dev/shm rw,nosuid,nodev,noexec - tmpfs tmpfs rw,inode64
26 28 0:24 / /run rw,nosuid,nodev,noexec - tmpfs tmpfs rw,mode=755,inode64
28 1 0:26 / / rw,relatime - zfs zroot/ROOT/void rw,xattr,posixacl,casesensitive
29 22 0:6 / /sys/kernel/security rw,relatime - securityfs securityfs rw
30 22 0:27 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime - efivarfs efivarfs rw
31 22 0:28 / /sys/fs/cgroup rw,relatime - cgroup2 cgroup2 rw,nsdelegate
32 28 0:29 / /home rw,relatime - zfs zroot/home rw,xattr,posixacl,casesensitive
33 28 8:65 / /boot/efi rw,relatime - vfat /dev/sde1 rw,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro
34 31 0:30 / /sys/fs/cgroup/systemd rw,relatime - cgroup systemd rw,name=systemd
35 28 0:31 / /var/lib/incus/shmounts rw,relatime shared:1 - tmpfs tmpfs rw,size=100k,mode=711,inode64
36 28 0:32 / /var/lib/incus/guestapi rw,relatime - tmpfs tmpfs rw,size=100k,mode=755,inode64
08:56:00 root@r320-1 ~ : ip a
4: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:80:a9:6e brd ff:ff:ff:ff:ff:ff
    inet 10.144.16.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever

REDACTED OTHER INTERFACES HERE. eno1 is my ethernet

08:53:21 root@r320-1 ~ : svlogtail
2024-09-27T00:48:48.43118 daemon.notice: Sep 26 20:48:48 incus: time="2024-09-26T20:48:48-04:00" level=info msg="Starting instance" action=start created="2024-02-29 20:06:57.266272138 +0000 UTC" ephemeral=false instance=elasticsearch-container instanceType=container project=default stateful=false used="2024-09-27 00:36:38.855079368 +0000 UTC"
2024-09-27T00:48:48.44813 kern.info: [ 1685.291475] lxdbr0: port 1(veth8279396d) entered blocking state
2024-09-27T00:48:48.44823 kern.info: [ 1685.291484] lxdbr0: port 1(veth8279396d) entered disabled state
2024-09-27T00:48:48.44910 kern.info: [ 1685.291522] veth8279396d: entered allmulticast mode
2024-09-27T00:48:48.44915 kern.info: [ 1685.291609] veth8279396d: entered promiscuous mode
2024-09-27T00:48:48.58117 kern.info: [ 1685.423755] physxJAnpl: renamed from veth623c49d3
2024-09-27T00:48:48.59223 kern.info: [ 1685.435017] eth0: renamed from physxJAnpl
2024-09-27T00:48:48.59816 kern.info: [ 1685.440977] lxdbr0: port 1(veth8279396d) entered blocking state
2024-09-27T00:48:48.59820 kern.info: [ 1685.440986] lxdbr0: port 1(veth8279396d) entered forwarding state
2024-09-27T00:48:48.60752 daemon.notice: Sep 26 20:48:48 incus: time="2024-09-26T20:48:48-04:00" level=error msg="Failed starting instance" action=start created="2024-02-29 20:06:57.266272138 +0000 UTC" ephemeral=false instance=elasticsearch-container instanceType=container project=default stateful=false used="2024-09-27 00:36:38.855079368 +0000 UTC"
2024-09-27T00:48:48.65021 kern.info: [ 1685.492770] veth8279396d: left allmulticast mode
2024-09-27T00:48:48.65030 kern.info: [ 1685.492788] veth8279396d: left promiscuous mode
2024-09-27T00:48:48.65033 kern.info: [ 1685.492809] lxdbr0: port 1(veth8279396d) entered disabled state
2024-09-27T00:48:48.77648 daemon.notice: Sep 26 20:48:48 incus: time="2024-09-26T20:48:48-04:00" level=info msg="Shut down instance" action=stop created="2024-02-29 20:06:57.266272138 +0000 UTC" ephemeral=false instance=elasticsearch-container instanceType=container project=default stateful=false used="2024-09-27 00:48:48.523737493 +0000 UTC"
08:56:34 root@r320-1 ~ : incus config show elasticsearch-container
architecture: x86_64
config:
  boot.autostart: "true"
  image.architecture: amd64
  image.description: Debian bookworm amd64 (20240228_05:24)
  image.os: Debian
  image.release: bookworm
  image.serial: "20240228_05:24"
  image.type: squashfs
  image.variant: default
  limits.kernel.memlock: "9223372036854775807"
  limits.kernel.nofile: "65535"
  volatile.base_image: b9a12bf99efdac578271b4a3e616e8cd3dec33faa2baff7923d2d6ca79ed8993
  volatile.cloud-init.instance-id: 99f5702a-ee99-4334-9df2-badef9e21895
  volatile.eth0.hwaddr: 00:16:3e:37:87:ec
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 7b26bed7-d58f-43c9-bab3-2ef911f410b7
  volatile.uuid.generation: 7b26bed7-d58f-43c9-bab3-2ef911f410b7
devices:
  elasticsearch-http-port:
    REDACTED
    type: proxy
  elasticsearch-trans-port:
    REDACTED
    type: proxy
  eth0:
    ipv4.address: REDACTED
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: elasticsearch-pool
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
acidvegas commented 1 month ago

If it is related to leftover LXD issues, I would like to know the proper way to rename all the lxdbr0 interfaces on the machine and in the incus configs to incusbr0 for uniformity.

But I am at a loss on debugging this one.

I tried without the custom rc.local commands aswell, same issue

stgraber commented 1 month ago

Log suggests another cgroup issue.

Please show: cat /proc/self/mountinfo

acidvegas commented 1 month ago

Log suggests another cgroup issue.

Please show: cat /proc/self/mountinfo

I already did in my previous message:

08:53:21 brandon@r320-1 ~ : cat /proc/self/mountinfo
21 28 0:20 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
22 28 0:21 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
23 28 0:5 / /dev rw,nosuid,noexec - devtmpfs devtmpfs rw,size=32838940k,nr_inodes=8209735,mode=755,inode64
24 23 0:22 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000
25 23 0:23 / /dev/shm rw,nosuid,nodev,noexec - tmpfs tmpfs rw,inode64
26 28 0:24 / /run rw,nosuid,nodev,noexec - tmpfs tmpfs rw,mode=755,inode64
28 1 0:26 / / rw,relatime - zfs zroot/ROOT/void rw,xattr,posixacl,casesensitive
29 22 0:6 / /sys/kernel/security rw,relatime - securityfs securityfs rw
30 22 0:27 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime - efivarfs efivarfs rw
31 22 0:28 / /sys/fs/cgroup rw,relatime - cgroup2 cgroup2 rw,nsdelegate
32 28 0:29 / /home rw,relatime - zfs zroot/home rw,xattr,posixacl,casesensitive
33 28 8:65 / /boot/efi rw,relatime - vfat /dev/sde1 rw,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro
34 31 0:30 / /sys/fs/cgroup/systemd rw,relatime - cgroup systemd rw,name=systemd
35 28 0:31 / /var/lib/incus/shmounts rw,relatime shared:1 - tmpfs tmpfs rw,size=100k,mode=711,inode64
36 28 0:32 / /var/lib/incus/guestapi rw,relatime - tmpfs tmpfs rw,size=100k,mode=755,inode64

Here are the 2 lines of interest:

31 22 0:28 / /sys/fs/cgroup rw,relatime - cgroup2 cgroup2 rw,nsdelegate
34 31 0:30 / /sys/fs/cgroup/systemd rw,relatime - cgroup systemd rw,name=systemd
duskmoss commented 1 month ago

I have CGROUP_MODE=unified in rc.conf (not local!) and no mounting of systemd in rc.local.

This is also whats currently recommended in void docs which I'm working on updating so if this doesn't work for you, I'd like to know.

acidvegas commented 1 month ago

I have CGROUP_MODE=unified in rc.conf (not local!) and no mounting of systemd in rc.local.

This is also whats currently recommended in void docs which I'm working on updating so if this doesn't work for you, I'd like to know.

Ok so when I do that it is working again I guess. Only issue I am seeing still is containers do not have an IP when they start from boot. I have to incus restart the container and then it will get an IP. Again this is only affecting my servers I have done lxd-to-incus in the past.

stgraber commented 1 month ago

Right, the over-mounting of the cgroup2 hierarchy is an invalid cgroup setup, so it's not unexpected that things get confused and fail. No idea why that would only affect a system that went through lxd-to-incus given that this is a system-wide setting completely outsied of Incus.

The network issue may be a similar problem. I don't see how the fact that the container was created under LXD matters for that, if all containers fail to get an IP, it suggests an issue with the system network management tooling, either firewalling things off or badly interacting with the veth devices that get created for the containers on startup (sounds most likely given an instance restart fixes it). It could also be some kind of race within the instance, in which case incus console --show-log NAME may be interesting.

Closing as there's so far no indication of an Incus bug here.