lxc / incus

Powerful system container and virtual machine manager
https://linuxcontainers.org/incus
Apache License 2.0
2.73k stars 225 forks source link

lxd-to-incus fails on void linux #625

Closed acidvegas closed 5 months ago

acidvegas commented 8 months ago
brandon@paloalto-svc : lxd-to-incus
=> Looking for source server
==> Detected: manual installation
=> Looking for target server
Error: No target server could be found

incus info and lxc info both working, didn't initialize incus though (as per the documentation)

stgraber commented 8 months ago

What version of Incus is that?

I certainly remember implementing and testing void support in lxd-to-incus but maybe that change isn't in your version yet.

stgraber commented 8 months ago

https://github.com/lxc/incus/pull/511

acidvegas commented 8 months ago

What version of Incus is that?

I certainly remember implementing and testing void support in lxd-to-incus but maybe that change isn't in your version yet.

0.5.1, the latest version in the void repos. I did see this PR but made this issue cause it seems the issue persists.

If there is any logs or information you need, lmk.

Thank you for the timely response.

acidvegas commented 8 months ago

Would seem as though the void repo is out of date

stgraber commented 8 months ago

https://github.com/lxc/incus/actions/runs/8304297262/artifacts/1331261213 should get you the static binaries for the current main branch here which should include the latest lxd-to-incus

acidvegas commented 8 months ago

https://github.com/lxc/incus/actions/runs/8304297262/artifacts/1331261213 should get you the static binaries for the current main branch here which should include the latest lxd-to-incus

The world needs more developers like you bredda. Cheers

acidvegas commented 8 months ago

Looks like it is in for a PR. My apologize for not investigating this more https://github.com/void-linux/void-packages/pull/49265

stgraber commented 8 months ago

No worries, glad it's working with the current version!

acidvegas commented 5 months ago
03:46:24brandon@paloalto-dev-34 ~ : lxd-to-incus
=> Looking for source server
==> Detected: xbps
=> Looking for target server
==> Detected: xbps
=> Connecting to source server
=> Connecting to the target server
=> Checking server versions
==> Source version: 5.20
==> Target version: 0.6
=> Validating version compatibility
=> Checking that the source server isn't empty
=> Checking that the target server is empty
Error: Target server isn't empty (storage pools found), can't proceed with migration.

even after deleteing the default zfs storage pool that was on incus, it then says:

 Error: Target server isn’t empty (networks found), can’t proceed with migration.

This is on incus 0.6

stgraber commented 5 months ago

With the current version of the code, the only way this would happen is if incus storage list does contain a storage pool.

Can you show:

acidvegas commented 5 months ago
05:52:15 root@blackhole ~ : incus project list
+-------------------+--------+----------+-----------------+-----------------+----------+---------------+-----------------------+---------+
|       NAME        | IMAGES | PROFILES | STORAGE VOLUMES | STORAGE BUCKETS | NETWORKS | NETWORK ZONES |      DESCRIPTION      | USED BY |
+-------------------+--------+----------+-----------------+-----------------+----------+---------------+-----------------------+---------+
| default (current) | YES    | YES      | YES             | YES             | YES      | YES           | Default Incus project | 2       |
+-------------------+--------+----------+-----------------+-----------------+----------+---------------+-----------------------+---------+
05:52:29 root@r620 ~ : incus storage list
+---------+--------+----------------------------------+-------------+---------+---------+
|  NAME   | DRIVER |              SOURCE              | DESCRIPTION | USED BY |  STATE  |
+---------+--------+----------------------------------+-------------+---------+---------+
| default | zfs    | /var/lib/incus/disks/default.img |             | 1       | CREATED |
+---------+--------+----------------------------------+-------------+---------+---------+
05:52:34 root@r620 ~ : incus network list
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
|   NAME   |   TYPE   | MANAGED |      IPV4      |           IPV6            | DESCRIPTION | USED BY |  STATE  |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| eno1     | physical | NO      |                |                           |             | 0       |         |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| eno2     | physical | NO      |                |                           |             | 0       |         |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| eno3     | physical | NO      |                |                           |             | 0       |         |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| eno4     | physical | NO      |                |                           |             | 0       |         |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| incusbr0 | bridge   | YES     | 10.25.101.1/24 | fd42:8419:29de:d411::1/64 |             | 1       | CREATED |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| lxdbr0   | bridge   | NO      |                |                           |             | 0       |         |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
05:52:38 root@r620 ~ : incus profile show default
config: {}
description: Default Incus profile
devices:
  eth0:
    name: eth0
    network: incusbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: default
used_by: []
stgraber commented 5 months ago

Right so that's indeed not a clean Incus server, do:

And then run lxd-to-incus again.

acidvegas commented 5 months ago

lxd-to-incus

Thank you bredda. Just needed clarification if that was how I was supposed to proceed.

Last issue I am encountering on void, after doing that when I try to start the container I get:

[services@blackhole ~]$ incus start elasticsearch-container
Error: Error occurred when starting proxy device: Error: No such file or directory - Failed to safely open namespace file descriptor based on pidfd 3
Try `incus info --show-log elasticsearch-container` for more info
stgraber commented 5 months ago

Can you show incus config show elasticsearch-container and uname -a?

acidvegas commented 5 months ago
architecture: x86_64
config:
  boot.autostart: "true"
  image.architecture: amd64
  image.description: Debian bookworm amd64 (20240228_05:24)
  image.os: Debian
  image.release: bookworm
  image.serial: "20240228_05:24"
  image.type: squashfs
  image.variant: default
  limits.kernel.memlock: "9223372036854775807"
  limits.kernel.nofile: "65535"
  volatile.base_image: b9a12bf99efdac578271b4a3e616e8cd3dec33faa2baff7923d2d6ca79ed8993
  volatile.cloud-init.instance-id: c6a9f533-a1de-4f56-a66f-a62336684579
  volatile.eth0.hwaddr: 00:16:3e:8e:df:93
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.uuid: 3095c9e3-3c33-4291-bf4e-1bbab4156e22
  volatile.uuid.generation: 3095c9e3-3c33-4291-bf4e-1bbab4156e22
devices:
  elasticsearch-http-port:
    connect: tcp:10.109.174.63:9200
    listen: tcp:0.0.0.0:1338
    type: proxy
  elasticsearch-trans-port:
    connect: tcp:10.109.174.63:9300
    listen: tcp:0.0.0.0:1337
    type: proxy
  eth0:
    ipv4.address: 10.109.174.63
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: elasticsearch-pool
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
Linux r320-2 6.6.32_1 #1 SMP PREEMPT_DYNAMIC Tue May 28 23:00:20 UTC 2024 x86_64 GNU/Linux
stgraber commented 5 months ago

There seems to be something going on with the Incus build on void, either because of the C library used or because of the kernel which is breaking pidfds. That's effectively out of scope for us as that's a distro-specific issue so something you may need to report to the void packager for Incus.

That said, given your config above, I'd recommend doing:

Which should take you away from using forkproxy (the bit that's using pidfds) and instead onto using kernel based firewalling (nftables or xtables) which will be faster and should work just fine in your case.

acidvegas commented 5 months ago
09:57:29 root@r320-2 /home/acidvegas : incus config device set elasticsearch-container elasticsearch-http-port nat=true
Error: Invalid devices: Device validation failed for "elasticsearch-http-port": Cannot listen on wildcard address "0.0.0.0" when in nat mode

if this is something leftover from a bad lxd-to-incus build can i just rm these and re-add them maybe?

Luckily i only have to do the lxd to incus transition one time haha.

stgraber commented 5 months ago

Ah, that's interesting, I thought we did support wildcard listen address for NAT mode. Do you have multiple IP addresses that you need those two proxy devices to listen on on the host side?

If not, changing the 0.0.0.0 to the address you want on the host should do the trick.

acidvegas commented 5 months ago

When using LXD I was just making it so incomming on port 1337 would forward to port 9200 inside the container.

My IP may change at times so thats why I was using 0.0.0.0

stgraber commented 5 months ago

Okay, so yeah, you'd definitely benefit from forkproxy working properly.

I don't know much about void, but all our tests for setups like yours are passing fine so it's got to be something going on with void. Kernel is unlikely as that's not a kernel build option and your kernel is pretty recent, so something related to the C library would be my guest.

Do you know if your system is using musl or glibc?

acidvegas commented 5 months ago

glibc, yes.

acidvegas commented 5 months ago

Im not sure if this helps:

Log:

lxc elasticsearch-container 20240605011615.690 INFO     lxccontainer - ../src/lxc/lxccontainer.c:do_lxcapi_start:997 - Set process title to [lxc monitor] /var/lib/incus/containers elasticsearch-container
lxc elasticsearch-container 20240605011615.691 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 4
lxc elasticsearch-container 20240605011615.691 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 5
lxc elasticsearch-container 20240605011615.691 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 6
lxc elasticsearch-container 20240605011615.691 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 16
lxc elasticsearch-container 20240605011615.691 INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver nop
lxc elasticsearch-container 20240605011615.691 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/proc/1057/exe callhook /var/lib/incus "default" "elasticsearch-container" start" for container "elasticsearch-container"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "[all]"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "[all]"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "kexec_load errno 38"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[246:kexec_load] action[327718:errno] arch[0]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "open_by_handle_at errno 38"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[304:open_by_handle_at] action[327718:errno] arch[0]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "init_module errno 38"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[175:init_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "finit_module errno 38"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[313:finit_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "delete_module errno 38"
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[176:delete_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240605011615.731 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:1017 - Merging compat seccomp contexts into main context
lxc elasticsearch-container 20240605011615.731 INFO     start - ../src/lxc/start.c:lxc_init:881 - Container "elasticsearch-container" is initialized
lxc elasticsearch-container 20240605011615.732 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1383 - The monitor process uses "lxc.monitor.elasticsearch-container" as cgroup
lxc elasticsearch-container 20240605011615.756 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1491 - The container process uses "lxc.payload.elasticsearch-container" as inner and "lxc.payload.elasticsearch-container" as limit cgroup
lxc elasticsearch-container 20240605011615.764 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWUSER
lxc elasticsearch-container 20240605011615.765 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWNS
lxc elasticsearch-container 20240605011615.765 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWPID
lxc elasticsearch-container 20240605011615.765 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWUTS
lxc elasticsearch-container 20240605011615.765 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWIPC
lxc elasticsearch-container 20240605011615.771 INFO     conf - ../src/lxc/conf.c:lxc_map_ids:3603 - Caller maps host root. Writing mapping directly
lxc elasticsearch-container 20240605011615.771 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc elasticsearch-container 20240605011615.772 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:fchowmodat:1611 - No such file or directory - Failed to fchownat(44, memory.oom.group, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc elasticsearch-container 20240605011615.772 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:fchowmodat:1611 - No such file or directory - Failed to fchownat(44, memory.reclaim, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc elasticsearch-container 20240605011615.773 INFO     start - ../src/lxc/start.c:do_start:1104 - Unshared CLONE_NEWNET
lxc elasticsearch-container 20240605011615.773 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc elasticsearch-container 20240605011615.773 NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1344 - Switched to gid 0
lxc elasticsearch-container 20240605011615.773 NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1353 - Switched to uid 0
lxc elasticsearch-container 20240605011615.773 INFO     start - ../src/lxc/start.c:do_start:1204 - Unshared CLONE_NEWCGROUP
lxc elasticsearch-container 20240605011615.806 INFO     conf - ../src/lxc/conf.c:setup_utsname:875 - Set hostname to "elasticsearch-container"
lxc elasticsearch-container 20240605011615.815 INFO     network - ../src/lxc/network.c:lxc_setup_network_in_child_namespaces:4019 - Finished setting up network devices with caller assigned names
lxc elasticsearch-container 20240605011615.815 INFO     conf - ../src/lxc/conf.c:mount_autodev:1219 - Preparing "/dev"
lxc elasticsearch-container 20240605011615.815 INFO     conf - ../src/lxc/conf.c:mount_autodev:1280 - Prepared "/dev"
lxc elasticsearch-container 20240605011615.816 INFO     conf - ../src/lxc/conf.c:lxc_fill_autodev:1317 - Populating "/dev"
lxc elasticsearch-container 20240605011615.816 INFO     conf - ../src/lxc/conf.c:lxc_fill_autodev:1405 - Populated "/dev"
lxc elasticsearch-container 20240605011615.816 INFO     conf - ../src/lxc/conf.c:lxc_transient_proc:3775 - Caller's PID is 1; /proc/self points to 1
lxc elasticsearch-container 20240605011615.816 INFO     conf - ../src/lxc/conf.c:lxc_setup_ttys:1072 - Finished setting up 0 /dev/tty<N> device(s)
lxc elasticsearch-container 20240605011615.817 INFO     conf - ../src/lxc/conf.c:setup_personality:1917 - Set personality to "0lx0"
lxc elasticsearch-container 20240605011615.817 NOTICE   conf - ../src/lxc/conf.c:lxc_setup:4469 - The container "elasticsearch-container" is set up
lxc elasticsearch-container 20240605011615.817 NOTICE   start - ../src/lxc/start.c:start:2194 - Exec'ing "/sbin/init"
lxc elasticsearch-container 20240605011615.818 NOTICE   start - ../src/lxc/start.c:post_start:2205 - Started "/sbin/init" with pid "2019"
lxc elasticsearch-container 20240605011615.818 NOTICE   start - ../src/lxc/start.c:signal_handler:446 - Received 17 from pid 2020 instead of container init 2019
lxc elasticsearch-container 20240605011615.859 INFO     error - ../src/lxc/error.c:lxc_error_set_and_log:31 - Child <2019> ended on error (255)
lxc elasticsearch-container 20240605011615.883 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/usr/libexec/incus/incusd callhook /var/lib/incus "default" "elasticsearch-container" stopns" for container "elasticsearch-container"
lxc elasticsearch-container 20240605011615.974 INFO     conf - ../src/lxc/conf.c:lxc_map_ids:3603 - Caller maps host root. Writing mapping directly
lxc elasticsearch-container 20240605011615.974 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc elasticsearch-container 20240605011615.993 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/usr/libexec/incus/incusd callhook /var/lib/incus "default" "elasticsearch-container" stop" for container "elasticsearch-container"

Kind of boned right now. All my containers have been converted with lxd-to-incus, just can't get them to start right now, so everything is halted.

acidvegas commented 5 months ago

Very unfamiliar with this territory @stgraber , any other logs or debug information that might help?

All my infrastructure is kind of stuck right now since it removed LXD already so I am at a stand still on this one :(

stgraber commented 5 months ago

Can you do:

incus create images:alpine/edge a1
incus config device add a1 proxy1 proxy connect=tcp:0.0.0.0:9200 listen=tcp:0.0.0.0:1338
incus config device add a1 proxy2 proxy connect=tcp:0.0.0.0:9300 listen=tcp:0.0.0.0:1337
incus start a1

I've tested that here inside of a void container running on my Debian 12 system and that's working just fine, so if that's failing for you, then that would point towards a kernel issue.

acidvegas commented 5 months ago

incus start a1

[brandon@blackhole ~]$ incus create images:alpine/edge a1
Creating a1
[brandon@blackhole ~]$ incus config device add a1 proxy1 proxy connect=tcp:0.0.0.0:9200 listen=tcp:0.0.0.0:1338
Device proxy1 added to a1
[brandon@blackhole ~]$ incus config device add a1 proxy2 proxy connect=tcp:0.0.0.0:9300 listen=tcp:0.0.0.0:1337
Device proxy2 added to a1
[brandon@blackhole ~]$ incus start a1
[brandon@blackhole ~]$ incus list
+-------------------------+---------+-----------------------+------+-----------+-----------+
|          NAME           |  STATE  |         IPV4          | IPV6 |   TYPE    | SNAPSHOTS |
+-------------------------+---------+-----------------------+------+-----------+-----------+
| a1                      | RUNNING | 10.109.174.173 (eth0) |      | CONTAINER | 0         |
+-------------------------+---------+-----------------------+------+-----------+-----------+
| elasticsearch-container | STOPPED |                       |      | CONTAINER | 0         |
+-------------------------+---------+-----------------------+------+-----------+-----------+

It looks like that ran no problem.

side note: I still have my storage pool for the elastic container....so my data is ok i hope

+--------------------+--------+-------------------------------------------------+-------------+---------+---------+
|        NAME        | DRIVER |                     SOURCE                      | DESCRIPTION | USED BY |  STATE  |
+--------------------+--------+-------------------------------------------------+-------------+---------+---------+
| default            | dir    | /var/lib/incus/storage-pools/default            |             | 2       | CREATED |
+--------------------+--------+-------------------------------------------------+-------------+---------+---------+
| elasticsearch-pool | dir    | /var/lib/incus/storage-pools/elasticsearch-pool |             | 1       | CREATED |
+--------------------+--------+-------------------------------------------------+-------------+---------+---------+
| test-pool          | dir    | /var/lib/incus/storage-pools/test-pool          |             | 0       | CREATED |
+--------------------+--------+-------------------------------------------------+-------------+---------+---------+

Thats so odd though. So what do you think, is there a better solution for the elasticcontainer?

Any way I can maybe clone the container and add the port forwards to the new cloned container maybe?

stgraber commented 5 months ago

Can you try starting your container without those two devices, see if it starts up fine then or if it hits another problem?

acidvegas commented 5 months ago

Can you try starting your container without those two devices, see if it starts up fine then or if it hits another problem?

It ran without an error but the container is showing as STOPPED still

[brandon@blackhole ~]$ incus start elasticsearch-container
[brandon@blackhole ~]$ incus list
+-------------------------+---------+------+------+-----------+-----------+
|          NAME           |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------------------------+---------+------+------+-----------+-----------+
| a1                      | STOPPED |      |      | CONTAINER | 0         |
+-------------------------+---------+------+------+-----------+-----------+
| elasticsearch-container | STOPPED |      |      | CONTAINER | 0         |
+-------------------------+---------+------+------+-----------+-----------+
[brandon@blackhole root]$ incus config show elasticsearch-container
architecture: x86_64
config:
  boot.autostart: "true"
  image.architecture: amd64
  image.description: Debian bookworm amd64 (20240228_05:24)
  image.os: Debian
  image.release: bookworm
  image.serial: "20240228_05:24"
  image.type: squashfs
  image.variant: default
  limits.kernel.memlock: "9223372036854775807"
  limits.kernel.nofile: "65535"
  volatile.base_image: b9a12bf99efdac578271b4a3e616e8cd3dec33faa2baff7923d2d6ca79ed8993
  volatile.cloud-init.instance-id: c6a9f533-a1de-4f56-a66f-a62336684579
  volatile.eth0.hwaddr: 00:16:3e:8e:df:93
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 3095c9e3-3c33-4291-bf4e-1bbab4156e22
  volatile.uuid.generation: 3095c9e3-3c33-4291-bf4e-1bbab4156e22
devices:
  eth0:
    ipv4.address: 10.109.174.61
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: elasticsearch-pool
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
[brandon@blackhole root]$ incus info --show-log elasticsearch-container
Name: elasticsearch-container
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/02/29 15:07 EST
Last Used: 2024/06/06 10:48 EDT

Log:

lxc elasticsearch-container 20240606144807.980 INFO     lxccontainer - ../src/lxc/lxccontainer.c:do_lxcapi_start:997 - Set process title to [lxc monitor] /var/lib/incus/containers elasticsearch-container
lxc elasticsearch-container 20240606144807.981 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 4
lxc elasticsearch-container 20240606144807.981 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 5
lxc elasticsearch-container 20240606144807.981 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 6
lxc elasticsearch-container 20240606144807.981 INFO     start - ../src/lxc/start.c:lxc_check_inherited:325 - Closed inherited fd 16
lxc elasticsearch-container 20240606144807.981 INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver nop
lxc elasticsearch-container 20240606144807.981 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/proc/1021/exe callhook /var/lib/incus "default" "elasticsearch-container" start" for container "elasticsearch-container"
lxc elasticsearch-container 20240606144808.220 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "[all]"
lxc elasticsearch-container 20240606144808.221 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
lxc elasticsearch-container 20240606144808.221 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240606144808.221 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240606144808.221 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
lxc elasticsearch-container 20240606144808.221 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "[all]"
lxc elasticsearch-container 20240606144808.222 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "kexec_load errno 38"
lxc elasticsearch-container 20240606144808.222 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[246:kexec_load] action[327718:errno] arch[0]
lxc elasticsearch-container 20240606144808.222 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240606144808.222 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240606144808.222 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "open_by_handle_at errno 38"
lxc elasticsearch-container 20240606144808.222 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[304:open_by_handle_at] action[327718:errno] arch[0]
lxc elasticsearch-container 20240606144808.222 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240606144808.222 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240606144808.223 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "init_module errno 38"
lxc elasticsearch-container 20240606144808.223 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[175:init_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240606144808.223 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240606144808.223 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240606144808.223 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "finit_module errno 38"
lxc elasticsearch-container 20240606144808.223 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[313:finit_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240606144808.223 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240606144808.224 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240606144808.224 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:807 - Processing "delete_module errno 38"
lxc elasticsearch-container 20240606144808.224 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[176:delete_module] action[327718:errno] arch[0]
lxc elasticsearch-container 20240606144808.224 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741827]
lxc elasticsearch-container 20240606144808.224 INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741886]
lxc elasticsearch-container 20240606144808.224 INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:1017 - Merging compat seccomp contexts into main context
lxc elasticsearch-container 20240606144808.224 INFO     start - ../src/lxc/start.c:lxc_init:881 - Container "elasticsearch-container" is initialized
lxc elasticsearch-container 20240606144808.231 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1383 - The monitor process uses "lxc.monitor.elasticsearch-container" as cgroup
lxc elasticsearch-container 20240606144808.345 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1491 - The container process uses "lxc.payload.elasticsearch-container" as inner and "lxc.payload.elasticsearch-container" as limit cgroup
lxc elasticsearch-container 20240606144808.352 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWUSER
lxc elasticsearch-container 20240606144808.352 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWNS
lxc elasticsearch-container 20240606144808.352 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWPID
lxc elasticsearch-container 20240606144808.352 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWUTS
lxc elasticsearch-container 20240606144808.353 INFO     start - ../src/lxc/start.c:lxc_spawn:1762 - Cloned CLONE_NEWIPC
lxc elasticsearch-container 20240606144808.415 INFO     conf - ../src/lxc/conf.c:lxc_map_ids:3603 - Caller maps host root. Writing mapping directly
lxc elasticsearch-container 20240606144808.416 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc elasticsearch-container 20240606144808.420 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:fchowmodat:1611 - No such file or directory - Failed to fchownat(44, memory.oom.group, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc elasticsearch-container 20240606144808.420 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:fchowmodat:1611 - No such file or directory - Failed to fchownat(44, memory.reclaim, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc elasticsearch-container 20240606144808.433 INFO     start - ../src/lxc/start.c:do_start:1104 - Unshared CLONE_NEWNET
lxc elasticsearch-container 20240606144808.434 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc elasticsearch-container 20240606144808.434 NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1344 - Switched to gid 0
lxc elasticsearch-container 20240606144808.434 NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1353 - Switched to uid 0
lxc elasticsearch-container 20240606144808.435 INFO     start - ../src/lxc/start.c:do_start:1204 - Unshared CLONE_NEWCGROUP
lxc elasticsearch-container 20240606144808.666 INFO     conf - ../src/lxc/conf.c:setup_utsname:875 - Set hostname to "elasticsearch-container"
lxc elasticsearch-container 20240606144808.795 INFO     network - ../src/lxc/network.c:lxc_setup_network_in_child_namespaces:4019 - Finished setting up network devices with caller assigned names
lxc elasticsearch-container 20240606144808.795 INFO     conf - ../src/lxc/conf.c:mount_autodev:1219 - Preparing "/dev"
lxc elasticsearch-container 20240606144808.796 INFO     conf - ../src/lxc/conf.c:mount_autodev:1280 - Prepared "/dev"
lxc elasticsearch-container 20240606144808.807 INFO     conf - ../src/lxc/conf.c:lxc_fill_autodev:1317 - Populating "/dev"
lxc elasticsearch-container 20240606144808.809 INFO     conf - ../src/lxc/conf.c:lxc_fill_autodev:1405 - Populated "/dev"
lxc elasticsearch-container 20240606144808.809 INFO     conf - ../src/lxc/conf.c:lxc_transient_proc:3775 - Caller's PID is 1; /proc/self points to 1
lxc elasticsearch-container 20240606144808.811 INFO     conf - ../src/lxc/conf.c:lxc_setup_ttys:1072 - Finished setting up 0 /dev/tty<N> device(s)
lxc elasticsearch-container 20240606144808.815 INFO     conf - ../src/lxc/conf.c:setup_personality:1917 - Set personality to "0lx0"
lxc elasticsearch-container 20240606144808.816 NOTICE   conf - ../src/lxc/conf.c:lxc_setup:4469 - The container "elasticsearch-container" is set up
lxc elasticsearch-container 20240606144808.821 NOTICE   start - ../src/lxc/start.c:start:2194 - Exec'ing "/sbin/init"
lxc elasticsearch-container 20240606144808.827 NOTICE   start - ../src/lxc/start.c:post_start:2205 - Started "/sbin/init" with pid "2506"
lxc elasticsearch-container 20240606144808.828 NOTICE   start - ../src/lxc/start.c:signal_handler:446 - Received 17 from pid 2507 instead of container init 2506
lxc elasticsearch-container 20240606144808.115 INFO     error - ../src/lxc/error.c:lxc_error_set_and_log:31 - Child <2506> ended on error (255)
lxc elasticsearch-container 20240606144808.136 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/usr/libexec/incus/incusd callhook /var/lib/incus "default" "elasticsearch-container" stopns" for container "elasticsearch-container"
lxc elasticsearch-container 20240606144808.212 INFO     conf - ../src/lxc/conf.c:lxc_map_ids:3603 - Caller maps host root. Writing mapping directly
lxc elasticsearch-container 20240606144808.212 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc elasticsearch-container 20240606144808.231 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/usr/libexec/incus/incusd callhook /var/lib/incus "default" "elasticsearch-container" stop" for container "elasticsearch-container"
stgraber commented 5 months ago

incus console elasticsearch-container --show-log may be useful.

I suspect the container instantly dying on startup is actually the root cause of your problems as that would cause forkproxy (proxy devices) to attempt to connect to a container that died a few milliseconds beforehand, resulting in that error.

acidvegas commented 5 months ago

incus console elasticsearch-container --show-log

[brandon@blackhole ~]$ incus console elasticsearch-container --show-log

Console log:

Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems.
Exiting PID 1...

Weird, it's trying to look for a systemd mount...Void does not use systemd

Cheers though...seems like we are getting somewhere on this....though this error message seems oddly familiar https://github.com/lxc/lxc/issues/4072

I am wondering, do you think its a cgroups issue with a difference in the host / container cgroup? I do see some solutions but they seem to be for systemd related systems

stgraber commented 5 months ago

Yeah, void not using systemd is likely to be the issue because that means that the required systemd cgroup wouldn't exist.

Can you show:

stgraber commented 5 months ago

There are different ways around this one but it depends on what void may already have set up.

acidvegas commented 5 months ago
[brandon@blackhole ~]$ grep cgroup /proc/self/mounts
cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755,inode64 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls cgroup rw,relatime,net_cls 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/net_prio cgroup rw,relatime,net_prio 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,relatime,pids 0 0
cgroup /sys/fs/cgroup/rdma cgroup rw,relatime,rdma 0 0
cgroup /sys/fs/cgroup/misc cgroup rw,relatime,misc 0 0
cgroup2 /sys/fs/cgroup/unified cgroup2 rw,relatime,nsdelegate 0 0
[brandon@blackhole ~]$ cat /proc/self/cgroup
14:misc:/
13:rdma:/
12:pids:/
11:hugetlb:/
10:net_prio:/
9:perf_event:/
8:net_cls:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpuacct:/
2:cpu:/
1:cpuset:/
0::/
stgraber commented 5 months ago

Ah, so hybrid v1 and v2, that's getting pretty unusual these days... Try:

mkdir /sys/fs/cgroup/systemd
mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd 
acidvegas commented 5 months ago

Ah, so hybrid v1 and v2, that's getting pretty unusual these days... Try:

mkdir /sys/fs/cgroup/systemd
mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd 

CHEERS. Holy crap that was a nightmare lol.

@stgraber I will say it again, you are one of the most helpful & reactive developers I know of. Thank you bredda.

dkwo commented 5 months ago

in case it helps, void specific stuff in usually in readme.void https://raw.githubusercontent.com/void-linux/void-packages/master/srcpkgs/incus/files/README.voidlinux in particular, see:

Some container configurations may require that the CGROUP_MODE variable in /etc/rc.conf be set to unified.