Closed bazaah closed 7 months ago
Your system is running cgroup1? (you can show ls -lh /sys/fs/cgroup
if unsure)
cgroup2 /sys/fs/cgroup/cgroup.controllers
exists
Edit: from mount
:
$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
Cool, thanks. I'll try to look into this one tomorrow or Friday, looks pretty easy to sort out based on the runc change.
Got the issue reproduced, I'll try a quick fix now but this may get postponed for a week or so as I'm about to leave on a trip :)
Thanks for the fast turnaround, I appreciate it.
Required information
Archlinux
Linux <snip> 6.8.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 16 Mar 2024 17:15:35 +0000 x86_64 GNU/Linux
Incus Info
``` config: core.https_address:Issue description
While attempting to get hugepages working for an unprivileged container postgres database, I encountered repeated segfaults during the
initdb
sequence.This was somewhat confusing to me, because by default postgres/initdb will attempt to use hugepages, but gracefully fallback to normal memory if unavailable, so clearly, postgres had been sufficiently induced to believe that hugepages did exist, but when it went to use them the host kernel killed the process.
Sometime later this evening I think I have it figured out.
At the end of the repro, you'll be greeted with an error like:
Googling around this error brings up lots of related issues, particularly around Kubernetes deployments.
However, eventually you'll find https://github.com/opencontainers/runtime-spec/issues/1050 which explains the problem:
Which was fixed/added to runc in https://github.com/opencontainers/runc/pull/4073.
I'm not sure how exactly this translates to incus's codebase, but from what little digging I've done around the hugetlb controller, I can find no mention of setting
hugetlb.<pagesize>.rsvd
cgroups, only the olderhugetlb.<pagesize>.limit_in_bytes
.Steps to reproduce
Information to attach
pg_createcluster log
``` root@hugepages-demo:~# pg_createcluster 15 main -- --debug Creating new PostgreSQL cluster 15/main ... /usr/lib/postgresql/15/bin/initdb -D /var/lib/postgresql/15/main --auth-local peer --auth-host scram-sha-256 --no-instructions --debug Running in debug mode. The files belonging to this database system will be owned by user "postgres". This user must also own the server process. VERSION=15.6 (Debian 15.6-0+deb12u1) PGDATA=/var/lib/postgresql/15/main share_path=/usr/share/postgresql/15 PGPATH=/usr/lib/postgresql/15/bin POSTGRES_SUPERUSERNAME=postgres POSTGRES_BKI=/usr/share/postgresql/15/postgres.bki POSTGRESQL_CONF_SAMPLE=/usr/share/postgresql/15/postgresql.conf.sample PG_HBA_SAMPLE=/usr/share/postgresql/15/pg_hba.conf.sample PG_IDENT_SAMPLE=/usr/share/postgresql/15/pg_ident.conf.sample The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. fixing permissions on existing directory /var/lib/postgresql/15/main ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 20 selecting default shared_buffers ... 400kB selecting default time zone ... Etc/UTC creating configuration files ... ok running bootstrap script ... 2024-04-18 00:41:23.828 UTC [4400] DEBUG: invoking IpcMemoryCreate(size=3891200) Bus error (core dumped) child process exited with exit code 135 initdb: removing contents of data directory "/var/lib/postgresql/15/main" Error: initdb failed ```Side note, congrats on the first stable release of incus. I was very happy to see the project back in the hands of linuxcontainers after the Canonical announcement.