koverstreet / bcachefs

Other
661 stars 70 forks source link

Kernel panic when trying to rm a file #278

Open Lahvuun opened 3 years ago

Lahvuun commented 3 years ago

panic log: netconsole.log commit: 1a510b00b6d1498190197cf804585959449380cb bcachefs-tools commit: 07ec713e056a73337bc0f53f5910f5179537b2c2

The file I'm trying to remove has a reflink copy created with cp -r --reflink=always. Trying to remove either results in this kernel panic.

koverstreet commented 2 years ago

This should be fixed now, with the btree_path patches

Lahvuun commented 2 years ago

I updated to c988c58e7f922977a4dabea1ebb8bf7bf2dde805 but am still getting a kernel panic. Call trace seems to be different: netconsole.log

koverstreet commented 2 years ago

Can you get me the full log? It should have dumped the btree paths in use before that

Lahvuun commented 2 years ago

I ran dmesg -n 8 before loading netconsole, there are more messages but I'm not sure if they are what you want: netconsole.log

koverstreet commented 2 years ago

So that's interesting, you've got a reflink pointer that spans many indirect extents.

Except that's not supposed to happen, reflink pointer merging is disabled because it would trigger this very issue.

Interesting.

koverstreet commented 2 years ago

Did you by chance start out with data checksumming disabled, write some data, then enable it?

Lahvuun commented 2 years ago

I don't think I ever consciously enabled data checksumming. Reflink is the only "advanced" feature that I use, I don't really have the need for anything else like compression or checksumming. Pretty sure I just formatted the partition without any options and started using it. The filesystem is pretty old, though, and has been through a few on disk format changes. bcachefs show-super /dev/sdb1 says it was created at the start of 2020:

External UUID:                  5fb3e4e2-6ceb-4304-b8c3-72aa250908e6
Internal UUID:                  b5466a23-868c-4a00-adf5-6049fc9073a5
Device index:                   0
Label:
Version:                        15
Oldest version on disk:         10
Created:                        Fri Jan 17 23:14:15 2020
Squence number:                 5695
Block_size:                     512
Btree node size:                256.0K
Error action:                   ro
Clean:                          0
Features:                       atomic_nlink,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,incompressible,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Metadata replicas:              1
Data replicas:                  1
Metadata checksum type:         crc32c (1)
Data checksum type:             crc32c (1)
Compression type:               none (0)
Foreground write target:        none
Background write target:        none
Promote target:                 none
Metadata target:                none
String hash type:               siphash (2)
32 bit inodes:                  0
GC reserve percentage:          8%
Root reserve percentage:        0%
Devices:                        1 live, 1 total
Sections:                       journal,members,replicas_v0,clean,journal_seq_blacklist
Superblock size:                11472

Members (size 64):
  Device 0:
    UUID:                       c56698e0-7bca-4a2c-a34c-757f15bb1b88
    Size:                       588.2G
    Bucket size:                512.0K
    First bucket:               0
    Buckets:                    1204634
    Last mount:                 Thu Oct 14 14:14:49 2021
    State:                      rw
    Group:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Replacement policy:         lru
    Discard:                    0

I also forgot to mention that I've been getting a message on boot from (I assume) fsck that says error invalidating buckets: 1, not sure how relevant this is.

If you're interested, I could try to delete everything that doesn't trigger this kernel panic, shrink the partition, dump and upload it to google drive or something for you to examine. If shrinking doesn't fix the issue, of course.

Lastly, I have no need of salvaging this filesystem. Since no one else has experienced this and it's not supposed to happen anyway, I can just reformat so we can move on, if you'd rather not deal with this.

koverstreet commented 2 years ago

I think the "error invalidating buckets: 1" message should be fixed now - that's just a normal shutdown, it shouldn't be printing that.

There is a dump tool in bcachefs-tools for dumping filesystem metadata, that'd be easier than shrinking it.

This is worth fixing - if I can fix it, that would mean that we'll be able to safely merge reflink pointers, and without a fix it is an issue that other people could potentially hit, with enough fragmentation of existing reflinked data. This is a tricky one though, I'll spend some time on it and see how far I get.

Lahvuun commented 2 years ago

OK. Do you need the metadata dump?

If yes, is there any way I can strip sensitive data (file contents and filenames) from it? Is just deleting them prior to dumping enough? If it is, wouldn't the filename still be in the journal? Sorry for so many questions, I don't really know anything about how filesystems work.

koverstreet commented 2 years ago

The dump contains metadata, not data - so, filenames but not contents (except for small inline files). If you delete things and then do a clean shutdown the journal shouldn't contain them, but since btree nodes are log structured they'll still be present there until the btree node fills up and is rewritten, if you're paranoid.

Lahvuun commented 2 years ago

I tried dumping with bcachefs dump -o metadata.qcow2 /dev/sdb1, left it for two and a half hours but the process seems to just get stuck:

strace -f -p $(pidof bcachefs)
strace: Process 1848 attached with 15 threads
[pid  1864] futex(0x56312d406d40, FUTEX_WAIT, 4294967295, NULL <unfinished ...>
[pid  1862] futex(0x56312d417f5c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1861] futex(0x56312d406f8c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1860] futex(0x56312d3ff29c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1859] futex(0x56312d3f8ebc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1858] futex(0x56312d3f8c0c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1857] futex(0x56312d3f895c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1856] futex(0x56312bd052c8, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid  1855] io_getevents(0x7fd1e3ccb000, 1, 8,  <unfinished ...>
[pid  1854] futex(0x56312d3f6ebc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1852] futex(0x56312d3f6c0c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1851] futex(0x56312d3f695c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1850] futex(0x56312d3f66ac, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  1849] futex(0x56312d3f63fc, FUTEX_WAIT_PRIVATE, 1, NULL

While I was in the process of cleaning the partition I found many more files triggering this issue. It looks like all the affected ones were created from January 2020 to April 2020. This affects both reflink copies and the source files that were being copied. I could delete files without reflinked copies created during this time period just fine, same with files and reflink copies that were created after April 2020. So, if you remember any similar sounding bugs from back then, this is probably it.

Lahvuun commented 2 years ago

Still happens with 40eaef7e8049b75ff7e5da42227295c754d9c906