Open mtippmann opened 2 weeks ago
can you try to run a debug build? (configure with --enable-debug
) - I think this is the same problem we're seeing with @theubuntuguy here https://github.com/vpsfreecz/zfs/pull/1 - it seems like some kind of race when the memory is tight, probably dbuf_evict thread steps into something it's not supposed to... I have no idea honestly, to me this is pretty difficult to debug. Lots of moving parts in dbufs vs arc vs znode lifetime vs memory reclaim :(
would be great if you could try - if you see "Kernel panic - not syncing: buffer modified while frozen!" - then it's probably the same problem
FWIW it seems to be related to how ZFS works on 6.10 and newer kernels, older ones don't hit it, this bug is also already present in OpenZFS 2.2 stable release.
when you say 2.2.6 is fine, are you sure that is also with 6.11 kernel series?
If it ends up being the same issue, it's also worth noting that we've tried disabling block cloning, direct IO and tried to run only with sync=disabled
, none of that has made any difference.
System information
Describe the problem you're observing
I'm seeing segmentation faults when using zfs git (zfs 2.2.6 is fine) with
init_on_alloc=0 init_on_free=0
incmdline
- nothing indmesg
- I can trigger that using adocker compose up
with a few containers rails, mysql - after that system crashes and most commands fail. Shortly after it first appears whole system is crashing includingplasmashell
and so on.It's a system I need to work so I was going back to 2.2.6 where everything is fine and stable. Not using
init_on_alloc=0 init_on_free=0
might help but i'm not 100% sure here. I'm not using zvols.System passes a bios memory test just fine. Dell Latitude E5470 / i7-6820HQ
Describe how to reproduce the problem
Good question. Maybe it reproduces using the kmod options listed here and the cmdline - for me it's triggered by a
docker compose up
so it could be related to overlayfs. At least that's when I was noticing it.I assume it's a problem related to my kmod config settings or the cmdline settings overwise it would have already been found. Noticed a similiar behavoir a few weeks ago and tried pinning it down but failed. So I'd thought i'd put that here.
Include any warning/errors/backtraces from the system logs
there is nothing in dmesg. Below some random
journalctl
logfile entries about crashes (it all looks pretty random)