koverstreet / bcachefs

Other
643 stars 71 forks source link

filesystem won't mount, `bcache fsck` segfaults #608

Closed ghost closed 8 months ago

ghost commented 8 months ago

Version

bcachefs-tools git commit cfa816bf3f823a3bedfedd8e214ea929c5c755fe

Generic info Provide the output of:

bcachefs show-super
# bcachefs show-super /dev/disk/by-partlabel/yuge1
External UUID:                              bf0cf51e-42f9-4bce-a7fa-4a6db2cc421e
Internal UUID:                              514b7190-3372-4b2a-83be-1de01302dee6
Device index:                               0
Label:
Version:                                    snapshot_trees
Oldest version on disk:                     snapshot_trees
Created:                                    Tue Oct 31 22:34:14 2023
Sequence number:                            94
Superblock size:                            5664
Clean:                                      0
Devices:                                    4
Sections:                                   members,crypt,clean,replicas,journal_seq_blacklist,journal_v2,counters
Features:                                   ec,journal_seq_blacklist_v3,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               4.00 KiB
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic
  metadata_replicas:                        2
  data_replicas:                            1
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none [crc32c] crc64 xxhash
  data_checksum:                            none [crc32c] crc64 xxhash
  compression:                              [none] lz4 gzip zstd
  background_compression:                   [none] lz4 gzip zstd
  str_hash:                                 crc32c crc64 [siphash]
  metadata_target:                          none
  foreground_target:                        none
  background_target:                        none
  promote_target:                           none
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  nocow:                                    0

members (size 232):
  Device:                                   0
    UUID:                                   5eca8b65-e452-45b0-bd54-1f9f97921074
    Size:                                   10.9 TiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                22888444
    Last mount:                             Sat Nov  4 16:15:07 2023
    State:                                  rw
    Label:                                  (none)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,parity
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   1
    UUID:                                   78ae940a-1bdd-43c9-981f-6192226d3490
    Size:                                   10.9 TiB
    Bucket size:                            1.00 MiB
    First bucket:                           0
    Buckets:                                11444222
    Last mount:                             Sat Nov  4 16:15:07 2023
    State:                                  rw
    Label:                                  (none)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,parity
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   2
    UUID:                                   acbbdac3-ac50-4bb8-8962-32a703372deb
    Size:                                   10.9 TiB
    Bucket size:                            1.00 MiB
    First bucket:                           0
    Buckets:                                11444222
    Last mount:                             Sat Nov  4 16:15:07 2023
    State:                                  rw
    Label:                                  (none)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,parity
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   3
    UUID:                                   136d8990-e710-43ab-ac85-484eb263b5c9
    Size:                                   10.9 TiB
    Bucket size:                            1.00 MiB
    First bucket:                           0
    Buckets:                                11444222
    Last mount:                             Sat Nov  4 16:15:07 2023
    State:                                  rw
    Label:                                  (none)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,parity
    Discard:                                0
    Freespace initialized:                  1

Tools bugs

type `bt` in to and provide the output here
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
    .

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bcachefs...
(No debugging symbols found in bcachefs)
(gdb) run fsck -pvy -o reconstruct_alloc /dev/disk/by-partlabel/yuge{1,2,3,4}
Starting program: /nix/store/s7yf4bpvmyf7jli9k7mir6j6c0nazq3c-system-path/bin/bcachefs fsck -pvy -o reconstruct_alloc /dev/disk/by-partlabel/yuge{1,2,3,4}
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/gqghjch4p1s69sv4mcjksb2kb65rwqjy-glibc-2.38-23/lib/libthread_db.so.1".
[New Thread 0x7ffff7a68680 (LWP 130814)]
[New Thread 0x7ffff7a5f680 (LWP 130815)]
[New Thread 0x7ffff7a56680 (LWP 130816)]
[New Thread 0x7ffff7a4d680 (LWP 130817)]
[New Thread 0x7ffff7a44680 (LWP 130818)]
[New Thread 0x7ffff7a3b680 (LWP 130819)]
[New Thread 0x7ffff7a32680 (LWP 130820)]
[New Thread 0x7ffff7a29680 (LWP 130821)]
bch2_fs_open()
bch2_read_super()
bch2_read_super() ret 0
bch2_read_super()
bch2_read_super() ret 0
bch2_read_super()
bch2_read_super() ret 0
bch2_read_super()
bch2_read_super() ret 0
bch2_fs_alloc()
[New Thread 0x7ffff799e680 (LWP 130822)]
[New Thread 0x7ffff7995680 (LWP 130823)]
[New Thread 0x7ffff798c680 (LWP 130824)]
[New Thread 0x7ffff7983680 (LWP 130825)]
[New Thread 0x7ffff797a680 (LWP 130826)]
bch2_fs_journal_init()
bch2_fs_journal_init() ret 0
bch2_fs_btree_cache_init()
bch2_fs_btree_cache_init() ret 0
[New Thread 0x7ffff6ecd680 (LWP 130827)]
bch2_fs_encryption_init()
bch2_fs_encryption_init() ret 0
__bch2_fs_compress_init()
__bch2_fs_compress_init() ret 0
bch2_dev_alloc()
bch2_dev_alloc() ret 0
bch2_dev_alloc()
bch2_dev_alloc() ret 0
bch2_dev_alloc()
bch2_dev_alloc() ret 0
bch2_dev_alloc()
bch2_dev_alloc() ret 0
bch2_fs_alloc() ret 0
recovering from unclean shutdown
starting journal read
journal read done on device 0x910550g, ret 0
journal read done on device 0x910930g, ret 0
journal read done on device 0x915010g, ret 0
journal read done on device 0x9150d0g, ret 0
journal read done, replaying entries 457512-473895
Journal keys: 43126014 read, 25949095 after sorting and compacting
[New Thread 0x7ffff001e680 (LWP 130866)]
starting alloc read
alloc read done
starting stripes_read
[New Thread 0x7ffe4fffd680 (LWP 130867)]
error validating btree node on 0x910550g at btree stripes level 0/1
  u64s 12 type btree_ptr_v2 0:19176:0 len 0 ver 0: seq 91f8446f1a74e1d3 written 288 min_key 0:18624:1 durability: 2 ptr: 3:38208:512 gen 0 ptr: 0:15380708:512 gen 0
  node offset 0: bad magic: want ba582e4b29ef4ca4, got a303cd3292d2c8f0
bch2_stripes_read(): error Input/output error
bch2_fs_recovery(): error Input/output error
error starting filesystem: Input/output error
shutting down
[Thread 0x7ffff001e680 (LWP 130866) exited]
error validating btree node on 0x910550g at btree stripes level 0/1
  u64s 12 type btree_ptr_v2 0:22470:0 len 0 ver 0: seq e906e40599b19f90 written 264 min_key 0:21918:1 durability: 2 ptr: 3:41528:512 gen 1 ptr: 0:15384034:512 gen 0
  node offset 0: bad magic: want ba582e4b29ef4ca4, got 8f8234dc94db08
shutdown complete
[Thread 0x7ffff6ecd680 (LWP 130827) exited]

Thread 1 "bcachefs" received signal SIGSEGV, Segmentation fault.
0x00007ffff7bf4dc0 in __memmove_ssse3 () from /nix/store/gqghjch4p1s69sv4mcjksb2kb65rwqjy-glibc-2.38-23/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7bf4dc0 in __memmove_ssse3 () from /nix/store/gqghjch4p1s69sv4mcjksb2kb65rwqjy-glibc-2.38-23/lib/libc.so.6
#1  0x00000000004c37ab in bch2_journal_keys_free ()
#2  0x00000000004d4630 in bch2_fs_release ()
#3  0x00000000004da591 in bch2_fs_open ()
#4  0x0000000000417ed9 in cmd_fsck ()
#5  0x0000000000410dab in main ()

Kernel bugs

dmesg
[128279.329627] bcachefs (bf0cf51e-42f9-4bce-a7fa-4a6db2cc421e): recovering from unclean shutdown
[128325.727803] bcachefs (bf0cf51e-42f9-4bce-a7fa-4a6db2cc421e): journal read done, replaying entries 457512-473895
[128417.894925] bcachefs (bf0cf51e-42f9-4bce-a7fa-4a6db2cc421e): error validating btree node on sdf1 at btree stripes level 0/1
[128417.894931]   u64s 12 type btree_ptr_v2 0:2706:0 len 0 ver 0: seq 5b6cba5c4d9f7481 written 264 min_key 0:2154:1 durability: 2 ptr: 3:21514:512 gen 0 ptr: 0:15364094:512 gen 0
[128417.894932]   node offset 0: bad magic: want ba582e4b29ef4ca4, got 75547e7a5669324e
[128417.934255] bcachefs (bf0cf51e-42f9-4bce-a7fa-4a6db2cc421e): bch2_stripes_read(): error EIO
[128417.942924] bcachefs (bf0cf51e-42f9-4bce-a7fa-4a6db2cc421e): bch2_fs_recovery(): error EIO
[128417.951283] bcachefs (bf0cf51e-42f9-4bce-a7fa-4a6db2cc421e): error starting filesystem: EIO
[128418.320117]  sdc: sdc1
[128418.327431]  sdb: sdb1 sdb2
[128418.352393]  sda: sda1
[128418.361070]  sdj: sdj1
[128418.374706]  sdi: sdi1
[128418.383399]  sdh: sdh1
[128418.391637]  sdg: sdg1
[128418.399788]  sdf: sdf1
[128418.412807]  sde: sde1
[128418.422677]  sdd: sdd1
[129595.337042] bcachefs: bch2_fs_open()
[129595.337050] bcachefs: bch2_read_super()
[129595.337303] bcachefs: bch2_read_super() ret 0
[129595.337306] bcachefs: bch2_read_super()
[129595.337556] bcachefs: bch2_read_super() ret 0
[129595.337560] bcachefs: bch2_read_super()
[129595.337774] bcachefs: bch2_read_super() ret 0
[129595.337776] bcachefs: bch2_read_super()
[129595.338005] bcachefs: bch2_read_super() ret 0
[129595.338008] bcachefs: bch2_fs_alloc()
[129595.348699] bcachefs: bch2_fs_journal_init()
[129595.348705] bcachefs: bch2_fs_journal_init() ret 0
[129595.348733] bcachefs: bch2_fs_btree_cache_init()
[129595.348755] bcachefs: bch2_fs_btree_cache_init() ret 0
[129595.348873] bcachefs: bch2_fs_encryption_init()
[129595.348906] bcachefs: bch2_fs_encryption_init() ret 0
[129595.348908] bcachefs: __bch2_fs_compress_init()
[129595.348913] bcachefs: __bch2_fs_compress_init() ret 0
[129595.348958] bcachefs: bch2_fs_fsio_init()
[129595.349160] bcachefs: bch2_fs_fsio_init() ret 0
[129595.349162] bcachefs: bch2_dev_alloc()
[129595.354188] bcachefs: bch2_dev_alloc() ret 0
[129595.354196] bcachefs: bch2_dev_alloc()
[129595.356824] bcachefs: bch2_dev_alloc() ret 0
[129595.356831] bcachefs: bch2_dev_alloc()
[129595.359497] bcachefs: bch2_dev_alloc() ret 0
[129595.359504] bcachefs: bch2_dev_alloc()
[129595.362009] bcachefs: bch2_dev_alloc() ret 0
[129595.362588] bcachefs: bch2_fs_alloc() ret 0
ghost commented 8 months ago

Updated to bcachefs-tools v1.3.1, still segfaults.

(gdb) bt
#0  0x00007ffff7c27584 in __memmove_ssse3 () from /nix/store/3zwkljka7z82rk992szcgh2x2qllyps6-glibc-2.38-27/lib/libc.so.6
#1  0x0000000000458a03 in memmove (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.38-27-dev/include/bits/string_fortified.h:36
#2  __move_gap (element_size=24, new_gap=7966633, old_gap=<optimized out>, size=<optimized out>, nr=<optimized out>, array=<optimized out>) at libbcachefs/util.h:741
#3  bch2_journal_key_insert_take (c=c@entry=0x7ffff79e3000, id=id@entry=BTREE_ID_alloc, level=level@entry=0, k=k@entry=0xae0840) at libbcachefs/btree_journal_iter.c:200
#4  0x0000000000458b2b in bch2_journal_key_insert (c=c@entry=0x7ffff79e3000, id=BTREE_ID_alloc, level=0, k=0x9d7b00) at libbcachefs/btree_journal_iter.c:227
#5  0x00000000004608d4 in do_bch2_trans_commit_to_journal_replay (trans=0xb40000) at libbcachefs/btree_trans_commit.c:992
#6  __bch2_trans_commit (trans=trans@entry=0xb40000, flags=flags@entry=32) at libbcachefs/btree_trans_commit.c:1038
#7  0x0000000000443817 in bch2_trans_commit (flags=32, journal_seq=0x0, disk_res=0x0, trans=0xb40000) at libbcachefs/btree_update.h:137
#8  bch2_gc_alloc_done (metadata_only=false, c=<optimized out>) at libbcachefs/btree_gc.c:1502
#9  bch2_gc (c=<optimized out>, initial=<optimized out>, metadata_only=<optimized out>) at libbcachefs/btree_gc.c:1885
#10 0x00000000004d07a3 in bch2_run_recovery_pass (pass=<optimized out>, c=0x7ffff79e3000) at libbcachefs/recovery.c:619
#11 bch2_run_recovery_passes (c=<optimized out>) at libbcachefs/recovery.c:636
#12 bch2_fs_recovery (c=0x7ffff79e3000) at libbcachefs/recovery.c:830
#13 0x00000000004eb00d in bch2_fs_start (c=c@entry=0x7ffff79e3000) at libbcachefs/super.c:965
#14 0x00000000004edec1 in bch2_fs_open (devices=devices@entry=0x7fffffffc088, nr_devices=nr_devices@entry=4, opts=...) at libbcachefs/super.c:1955
#15 0x0000000000418174 in cmd_fsck (argc=4, argv=0x7fffffffc088, argv@entry=0x7fffffffc068) at cmd_fsck.c:94
#16 0x0000000000410ff1 in main (argc=<optimized out>, argv=0x7fffffffc068) at bcachefs.c:229
ghost commented 8 months ago

I was able to fix this by upgrading my kernel to linux-next-20231107.