openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.62k stars 1.75k forks source link

segmentation faults / memory corruption using zfs git 152ae5c9bc #16689

Open mtippmann opened 2 weeks ago

mtippmann commented 2 weeks ago

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version rolling
Kernel Version 6.11.5-zen1-1-zen
Architecture amd64
OpenZFS Version 2.3.99.r34.g152ae5c9bc
~ » cat /proc/cmdline
zfs=zroot/arch rw mitigations=off init_on_alloc=0 init_on_free=0 lsm=landlock,lockdown,yama,integrity,apparmor,bpf pcie_aspm=performance systemd.gpt_auto=0 spl.spl_hostid=0x00bab10c
~ » cat /etc/modprobe.d/zfs.conf | grep -v \#
options zfs zfs_vdev_max_active=1024
options zfs zfs_txg_timeout=5
options zfs zfs_vdev_scrub_min_active=1
options zfs zfs_vdev_scrub_max_active=2
options zfs zfs_vdev_sync_write_min_active=1
options zfs zfs_vdev_sync_write_max_active=128
options zfs zfs_vdev_sync_read_min_active=1
options zfs zfs_vdev_sync_read_max_active=128
options zfs zfs_vdev_async_read_min_active=1
options zfs zfs_vdev_async_read_max_active=128
options zfs zfs_vdev_async_write_min_active=1
options zfs zfs_vdev_async_write_max_active=128
options zfs zfs_vdev_scheduler=none
options zfs zio_taskq_batch_pct=25
options zfs zfs_sync_taskq_batch_pct=25
options zfs zfs_prefetch_disable=1
options zfs zfs_arc_sys_free=2000000000
options zfs zvol_use_blk_mq=1
options zfs zfs_abd_scatter_enabled=0
options zfs compressed_arc_enabled=0
options zfs zfs_arc_shrinker_limit=0
options zfs zfs_bclone_enabled=0

Describe the problem you're observing

I'm seeing segmentation faults when using zfs git (zfs 2.2.6 is fine) with init_on_alloc=0 init_on_free=0 in cmdline - nothing in dmesg - I can trigger that using a docker compose up with a few containers rails, mysql - after that system crashes and most commands fail. Shortly after it first appears whole system is crashing including plasmashell and so on.

It's a system I need to work so I was going back to 2.2.6 where everything is fine and stable. Not using init_on_alloc=0 init_on_free=0 might help but i'm not 100% sure here. I'm not using zvols.

System passes a bios memory test just fine. Dell Latitude E5470 / i7-6820HQ

Describe how to reproduce the problem

Good question. Maybe it reproduces using the kmod options listed here and the cmdline - for me it's triggered by a docker compose up so it could be related to overlayfs. At least that's when I was noticing it.

I assume it's a problem related to my kmod config settings or the cmdline settings overwise it would have already been found. Noticed a similiar behavoir a few weeks ago and tried pinning it down but failed. So I'd thought i'd put that here.

Include any warning/errors/backtraces from the system logs

there is nothing in dmesg. Below some random journalctl logfile entries about crashes (it all looks pretty random)

Okt 25 15:41:16  systemd[1]: incus.service: Main process exited, code=dumped, status=11/SEGV
Okt 25 15:41:16  systemd[1]: systemd-coredump@85-98502-0.service: Deactivated successfully.
Okt 25 15:41:16  systemd-coredump[98503]: [🡕] Process 98494 (incusd) of user 0 dumped core.

                                                           Stack trace of thread 98494:
                                                           #0  0x000060139c936214 n/a (incusd + 0x579214)
                                                           #1  0x000060139c90af45 n/a (incusd + 0x54df45)
                                                           #2  0x000060139c8f9aea n/a (incusd + 0x53caea)
                                                           #3  0x000060139c8fa214 n/a (incusd + 0x53d214)
                                                           #4  0x000060139c8f71b6 n/a (incusd + 0x53a1b6)
                                                           #5  0x000060139c931551 n/a (incusd + 0x574551)
                                                           #6  0x000060139c8cc158 n/a (incusd + 0x50f158)
                                                           #7  0x000060139c8e68f3 n/a (incusd + 0x5298f3)
                                                           #8  0x000060139c8e6130 n/a (incusd + 0x529130)
                                                           #9  0x000060139c8e5bdc n/a (incusd + 0x528bdc)
                                                           #10 0x000060139c8e5b3b n/a (incusd + 0x528b3b)
                                                           #11 0x000060139c8d2e12 n/a (incusd + 0x515e12)
                                                           #12 0x000060139c8d2c85 n/a (incusd + 0x515c85)
                                                           #13 0x000060139c8d22b3 n/a (incusd + 0x5152b3)
                                                           #14 0x000060139c8cc785 n/a (incusd + 0x50f785)
                                                           #15 0x000060139c92bf6d n/a (incusd + 0x56ef6d)
                                                           #16 0x000060139c8cca45 n/a (incusd + 0x50fa45)
                                                           #17 0x000060139c8bffbe n/a (incusd + 0x502fbe)
                                                           #18 0x000060139c8bfa1d n/a (incusd + 0x502a1d)
                                                           #19 0x000060139c8fba09 n/a (incusd + 0x53ea09)
                                                           #20 0x000060139c937fe0 n/a (incusd + 0x57afe0)
                                                           #21 0x00007ad45137fecc __libc_start_main_impl (libc.so.6 + 0x25ecc)
                                                           #22 0x000060139c8bbdf5 n/a (incusd + 0x4fedf5)
                                                           ELF object binary architecture: AMD x86-64
Okt 25 15:41:17  systemd[1]: Starting Incus Container Hypervisor...
Okt 25 15:41:17  incusd[98550]: fatal error: arena already initialized
Okt 25 15:41:17  incusd[98550]: runtime stack:
Okt 25 15:41:17  incusd[98550]: runtime.throw({0x5642ef51278f?, 0x0?})
Okt 25 15:41:17  incusd[98550]:         /usr/lib/go/src/runtime/panic.go:1067 +0x4a fp=0x7fff025e56f0 sp=0x7fff025e56c0 pc=0x5642edef356a
Okt 25 15:41:17  incusd[98550]: runtime.(*mheap).sysAlloc(0x5642f0c409e0, 0x0?, 0x5642f0c50be8, 0x1)
Okt 25 15:41:17  incusd[98550]:         /usr/lib/go/src/runtime/malloc.go:768 +0x398 fp=0x7fff025e5790 sp=0x7fff025e56f0 pc=0x5642ede8e158
Okt 25 15:41:17  incusd[98550]: runtime.(*mheap).grow(0x5642f0c409e0, 0x0?)
Okt 25 15:41:20  systemd-coredump[98599]: [🡕] Process 98582 (containerd) of user 0 dumped core.

                                                           Stack trace of thread 98582:
                                                           #0  0x0000000000da081d n/a (containerd + 0x9a081d)
                                                           #1  0x0000000000d72d25 runtime.args (containerd + 0x972d25)
                                                           #2  0x0000000000da9a85 runtime.args.abi0 (containerd + 0x9a9a85)
                                                           #3  0x0000000000da0f32 runtime.rt0_go.abi0 (containerd + 0x9a0f32)
                                                           #4  0x00007bb13ce23ecc __libc_start_main_impl (libc.so.6 + 0x25ecc)
                                                           #5  0x0000000000d20455 _start (containerd + 0x920455)
                                                           ELF object binary architecture: AMD x86-64
Okt 25 15:41:20  systemd[1]: containerd.service: Main process exited, code=dumped, status=11/SEGV
Okt 25 15:41:20  systemd[1]: containerd.service: Failed with result 'core-dump'.
snajpa commented 23 hours ago

can you try to run a debug build? (configure with --enable-debug) - I think this is the same problem we're seeing with @theubuntuguy here https://github.com/vpsfreecz/zfs/pull/1 - it seems like some kind of race when the memory is tight, probably dbuf_evict thread steps into something it's not supposed to... I have no idea honestly, to me this is pretty difficult to debug. Lots of moving parts in dbufs vs arc vs znode lifetime vs memory reclaim :(

would be great if you could try - if you see "Kernel panic - not syncing: buffer modified while frozen!" - then it's probably the same problem

FWIW it seems to be related to how ZFS works on 6.10 and newer kernels, older ones don't hit it, this bug is also already present in OpenZFS 2.2 stable release.

snajpa commented 23 hours ago

when you say 2.2.6 is fine, are you sure that is also with 6.11 kernel series?

snajpa commented 23 hours ago

If it ends up being the same issue, it's also worth noting that we've tried disabling block cloning, direct IO and tried to run only with sync=disabled, none of that has made any difference.