koverstreet / bcachefs

Other
689 stars 72 forks source link

OOM during kernel fsck [82973f03] #559

Open RAOF opened 1 year ago

RAOF commented 1 year ago

When performing kernel fsck on a reasonably recent (82973f03) kernel, the kernel hits OOM state (first killing sh, then mount, then panicing as there are no userspace processes to kill).

netconsole.log

This did not happen on 9ba49e2, the previous kernel I've tested.

The system has 32GB RAM, and 14TB of storage - 2 x 256GB nvme SSDs, 2 x 4TB HDD, 2 x 3TB partitions.

RAOF commented 1 year ago

Hm. I see there have been a bunch of leak fixes in trunk since this commit; I'll give them a try!

RAOF commented 1 year ago

Hm, no dice. As of 3c29e114 it will sit at bcachefs (a4b165a2-556b-4650-88b8-d90f8ee4b473): checking allocations for an indefinite period (I left it >12 hrs) without problem (or apparent progress), but it seems that any non-bcachefs allocation during that time will OOM - I plugged in my mouse after that time and the kernel panicked on an order-0 allocation from the HID driver.

debaba commented 1 year ago

Had a similar problem on changing from kernel 6.4.0+ to 6.5.0+ Had lot of trouble with fscking the bcachefs. My OOM killed lots of processes etc. Then i stoppped autostarting bcachefs on startup. And saw lots of messages coming from the Marvell chip extension card i got my bcachefs disks sitting on. Kernel 6.5.0 is handling the so called Marvell Processor Console port in a way like it handles a normal SATA port on this extension card. Tried again with vanilla 6.5.1 kernel, same. After disabling the port (ata24:00) via kernel parameter (libata.force=24.00:disable) and reboot, i started systemd bcachefs service, the bcachefs fsck did some repair, started the bcachefs filesystem as if nothing had happened. -> the filesystem, that won't eat your data. leaving me impressed.