koverstreet / bcachefs

Other
675 stars 69 forks source link

fsck exits, claiming out of memory. #663

Open MrDrMcCoy opened 6 months ago

MrDrMcCoy commented 6 months ago

I have a machine with a bcachefs volume comprised of 7 HDDs and 2 SSDs. I had configured it to store all the metadata on the SSDs, as well as use the SSDs for read and write cache (both SSDs were Optane drives and I had replicas set to 2, so should have been fine). After driving the system pretty hard and with lots of data, one of those SSDs somehow renamed itself from nvme1n1 to nvme1n2. While the data was technically still there, bcachefs saw the drive as errored out and missing. Out of an abundance of caution, I set all data targets to be the HDDs, evacuated the data from the SSDs, removed the SSDs from the pool, and attempted an fsck.

Last week, I was on kernel 6.7 and ran the fsck via bcachefs fsck -pr /dev/sda:/dev/sdb:/dev/sdd:/dev/sdc:/dev/sde:/dev/sdf:/dev/sdg. It would identify an astonishing number of issues, consume ~90% of my 128GB RAM, and then just stall out. No I/O, no CPU usage, no further errors output. It wasn't completely hung however, as a CTRL-C would cause it to exit gracefully, but it would get no further and did not appear to actually make any changes.

Today, I updated to kernel 6.8 and tried again. This time, there was no stall out, but it exited with the following messages:

bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_journal_key_insert_take: error allocating new key array (size 134217728)
bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_gc_alloc_done(): error ENOMEM_journal_key_insert
bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_gc(): error ENOMEM_journal_key_insert
bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_fs_recovery(): error ENOMEM_journal_key_insert
bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_fs_start(): error starting filesystem ENOMEM_journal_key_insert
fsck binary is version 1.6: btree_subvolume_children but filesystem is 1.4: member_seq and kernel is 1.4: member_seq, using kernel fsck

I tried again with mount -t bcachefs -vo fsck,fix_errors /dev/sda:/dev/sdb:/dev/sdd:/dev/sdc:/dev/sde:/dev/sdf:/dev/sdg /bigstor and got the following error:

ERROR - bcachefs::commands::cmd_mount: Fatal error: Cannot allocate memory

dmesg shows similar errors for the second run:

[ 9394.532920] bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_journal_key_insert_take: error allocating new key array (size 134217728)
[ 9394.532942] bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_gc_alloc_done(): error ENOMEM_journal_key_insert
[ 9394.634999] bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_gc(): error ENOMEM_journal_key_insert
[ 9394.635034] bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_fs_recovery(): error ENOMEM_journal_key_insert
[ 9394.635045] bcachefs (a8486d4b-78ea-49a3-ae26-1d731e02ab27): bch2_fs_start(): error starting filesystem ENOMEM_journal_key_insert

I find it difficult to believe that it really ran out of memory, as I watched my memory usage during the scan and it was never full, nor was any swap consumed. Should I assume that this volume is untrustworthy and begin the process of recreating it and all its data from the ground up, or is there anything more I can do?

koverstreet commented 6 months ago

Can you give me a fuller log of the kinds of errors it was finding?

Would like to know where this all originated from.

It'll be a few days before I can really dig into this one, but if you can give me some time I ought to be able to get you going again.

koverstreet commented 6 months ago

Were you hit by the splitbrain bug?

MrDrMcCoy commented 6 months ago

Can you give me a fuller log of the kinds of errors it was finding?

Would like to know where this all originated from.

Here ya go: bcachefs-show-super.txt bcachefs-fsck-log.zip

You'll probably want to extract that with 7z, as I used it with the PPMd algorithm to compress the log due to it being the best compression for this sort of thing that Github will allow. Also be aware that it's ~25GB uncompressed.

It's worth noting that during this subsequent run to collect the logs, I am once again encountering the behavior where the fsck has consumed ~100GB RAM and is mostly sitting idle as opposed to exiting with an error. It's not completely stalled, as I see some occasional, brief spikes of CPU, context switching, and I/O, but it doesn't appear to be making any actual progress. I'll leave it running just in case it decides to move again, but the last time I left it overnight in this state, nothing appeared to happen.

It'll be a few days before I can really dig into this one, but if you can give me some time I ought to be able to get you going again.

Really appreciate your help :)

Were you hit by the splitbrain bug?

I'm not sure. Did some searching to try and determine what indicators would arise in that case, and saw someone mention that the logs might contain brain, but a grep of the log for that came up with nothing. I did see some posts that mentioned there were some bad states one could find themselves in with multi-device and kernel 6.7, which I was unaware of when I created this pool. Hopefully a finished fsck and upgrade will clear all that up.

koverstreet commented 6 months ago

Did you mount/fsck with reconstruct_alloc?

Given the errors you were hitting and what I'm seeing in the fsck log, it looks like that would explain what happened - all alloc info is just gone, and nothing else.

So that shouldn't be a problem to recover from if we just figure out the memory usage - reconstructing all alloc info is something I test pretty well.

In sysfs, we have a couple things for reporting on the btree node cache and btree key cache: /sys/fs/bcachefs//btree_cache_size /sys/fs/bcachefs//internal/btree_cache /sys/fs/bcachefs//internal/btree_key_cache

I'll be expanding the latter two soon; I recently had a report that something might be awry with btree node cache reclaim and there's a patchset that adds counters I need to pull in.

We can get more detailed info as well with the memory allocation profiling patchset - hasn't reached mainline just yet, but you can get it here: https://github.com/surenbaghdasaryan/linux.git memalloc_prof_v6

Give those a try, and let me know what you find

koverstreet commented 6 months ago

Actually, I know what this is - we're hitting the vmalloc 4G limit on the journal keys list.

I know how to fix this, but this is going to take some time...

MrDrMcCoy commented 6 months ago

No worries, thanks for looking into this :)

MrDrMcCoy commented 6 months ago

I know you're busy, but for completeness, I tried running bcachefs fsck -pRfyv to perform the reconstruct_alloc operation. The process ran for 3.5 minutes, then crashed and dumped core.

Mar 29 16:15:49 heartofgold systemd-coredump[4828]: [🡕] Process 4748 (bcachefs) of user 0 dumped core.

                                                    Stack trace of thread 4748:
                                                    #0  0x0000784c9d17f4b7 n/a (libc.so.6 + 0x1574b7)
                                                    #1  0x00005c4e86c05b9c n/a (bcachefs + 0x98b9c)
                                                    #2  0x00005c4e86bf7a69 n/a (bcachefs + 0x8aa69)
                                                    #3  0x00005c4e86bf87c1 n/a (bcachefs + 0x8b7c1)
                                                    #4  0x00005c4e86bf8bcf n/a (bcachefs + 0x8bbcf)
                                                    #5  0x00005c4e86bf8bcf n/a (bcachefs + 0x8bbcf)
                                                    #6  0x00005c4e86bf901f n/a (bcachefs + 0x8c01f)
                                                    #7  0x00005c4e86c76008 n/a (bcachefs + 0x109008)
                                                    #8  0x00005c4e86c7a3c4 n/a (bcachefs + 0x10d3c4)
                                                    #9  0x00005c4e86c96aa5 n/a (bcachefs + 0x129aa5)
                                                    #10 0x00005c4e86c9b99c n/a (bcachefs + 0x12e99c)
                                                    #11 0x00005c4e86bc7ee1 n/a (bcachefs + 0x5aee1)
                                                    #12 0x00005c4e86ba7e68 n/a (bcachefs + 0x3ae68)
                                                    #13 0x00005c4e86bba033 n/a (bcachefs + 0x4d033)
                                                    #14 0x00005c4e86bb6c29 n/a (bcachefs + 0x49c29)
                                                    #15 0x00005c4e86d59451 n/a (bcachefs + 0x1ec451)
                                                    #16 0x00005c4e86ba81c5 n/a (bcachefs + 0x3b1c5)
                                                    #17 0x0000784c9d04dcd0 n/a (libc.so.6 + 0x25cd0)
                                                    #18 0x0000784c9d04dd8a __libc_start_main (libc.so.6 + 0x25d8a)
                                                    #19 0x00005c4e86b91f35 n/a (bcachefs + 0x24f35)

                                                    Stack trace of thread 4755:
                                                    #0  0x0000784c9d0afebe n/a (libc.so.6 + 0x87ebe)
                                                    #1  0x0000784c9d0b2a65 pthread_cond_timedwait (libc.so.6 + 0x8aa65)
                                                    #2  0x00005c4e86ca0453 n/a (bcachefs + 0x133453)
                                                    #3  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #4  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #5  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4753:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4760:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4759:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4822:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x0000784c9d46b596 n/a (liburcu.so.8 + 0x4596)
                                                    #2  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #3  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4758:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4757:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4751:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4761:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4749:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca2a03 n/a (bcachefs + 0x135a03)
                                                    #2  0x00005c4e86ca0169 n/a (bcachefs + 0x133169)
                                                    #3  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #4  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #5  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4756:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca1952 n/a (bcachefs + 0x134952)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4821:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca0f8a n/a (bcachefs + 0x133f8a)
                                                    #2  0x00005c4e86c5f627 n/a (bcachefs + 0xf2627)
                                                    #3  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #4  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #5  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4754:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x0000784c9d27220d n/a (libaio.so.1 + 0x120d)
                                                    #2  0x00005c4e86c940a5 n/a (bcachefs + 0x1270a5)
                                                    #3  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #4  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #5  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4750:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4752:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4763:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)

                                                    Stack trace of thread 4762:
                                                    #0  0x0000784c9d12e88d syscall (libc.so.6 + 0x10688d)
                                                    #1  0x00005c4e86ca01df n/a (bcachefs + 0x1331df)
                                                    #2  0x00005c4e86c9de83 n/a (bcachefs + 0x130e83)
                                                    #3  0x0000784c9d0b355a n/a (libc.so.6 + 0x8b55a)
                                                    #4  0x0000784c9d130a3c n/a (libc.so.6 + 0x108a3c)
                                                    ELF object binary architecture: AMD x86-64
░░ Subject: Process 4748 (bcachefs) dumped core
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ Documentation: man:core(5)
░░ 
░░ Process 4748 (bcachefs) crashed and dumped core.
░░ 
░░ This usually indicates a programming error in the crashing program and
░░ should be reported to its vendor as a bug.

The core dump and output of strace -fiknxyyY --const-print-style=verbose are here.

koverstreet commented 6 months ago

I communicated that badly - I was asking if reconstruct_alloc was what you ran previously, all the errors in your superblock were consistent with that being the cause.

Your bcachefs build doesn't have debug symbols - we'll need that to get useful backtraces.

MrDrMcCoy commented 6 months ago

I communicated that badly - I was asking if reconstruct_alloc was what you ran previously, all the errors in your superblock were consistent with that being the cause.

What are the implications of having done that and having the process interrupted? My memory is fuzzy, but that may have happened before, and certainly has happened now.

Your bcachefs build doesn't have debug symbols - we'll need that to get useful backtraces.

If that would be useful, I can probably make that happen Sunday evening or Monday. Do just the tools need to be compiled with debug symbols for that, or the kernel and modules as well?

koverstreet commented 6 months ago

Just the tools - and yeah, that crash is something I want to get fixed right away.

MrDrMcCoy commented 6 months ago

Thankfully aur/bcachefs-tools-git doesn't strip debug symbols, so that was very easy to redo. bcachefs-fsck-coredump.lzma.zip

MrDrMcCoy commented 5 months ago

I updated the kernel to 6.9.0-rc4-1-mainline along with bcachefs-tools-git to 1.7.0 and re-ran the fsck. It exits after a short period of time similarly to before:

bch2_journal_key_insert_take: error allocating new key array (size 134217728)

bcachefs-fsck.zip