koverstreet / bcachefs

Other
633 stars 69 forks source link

bcachefs: libbcachefs/btree_iter.c:1418: bch2_trans_unlocked_error: Assertion `0' failed. #704

Open RAOF opened 1 week ago

RAOF commented 1 week ago

I have a reproducible crash when running an fsck on this system. I don't know whether it's important, but this system does not have enough RAM to store all the extents.

With bcachefs-tools 755788e2d6065ac35e680a23c48125a5cd63f7b7 (which has kernel code 9404a01d3dc5) (this reproduces as a panic when using kernel fsck, but userspace is easier to gdb), I get:

sudo gdb --args ./bcachefs fsck -v /dev/sda /dev/sdc /dev/sdd /dev/sde

<...>

mounting version 1.9: disk_accounting_v2 opts=ro,metadata_replicas=2,data_replicas=2,background_compression=zstd,foreground_target=ssd,background_target=rotational,promote_target=ssd,degraded,verbose,fsck,fix_errors=ask,read_only
recovering from unclean shutdown
Version upgrade from 1.7: mi_btree_bitmap to 1.9: disk_accounting_v2 incomplete
Doing compatible version upgrade from 1.7: mi_btree_bitmap to 1.9: disk_accounting_v2
  running recovery passes: check_allocations
starting journal read
journal read done on device /dev/sdd, ret 0
journal read done on device /dev/sda, ret 0
journal read done on device /dev/sdc, ret 0
journal read done on device /dev/sde, ret 0
journal read done, replaying entries 1201874-1201874
Journal keys: 0 read, 0 after sorting and compacting
[New Thread 0x7ffff78d9680 (LWP 8808)]
accounting_read... done
[New Thread 0x7ffff1a00680 (LWP 8809)]
alloc_read... done
stripes_read... done
snapshots_read... done
check_allocations... done
going read-write
journal_replay... done
check_alloc_info... done
check_lrus... done
check_btree_backpointers... done
check_backpointers_to_extents...bch2_check_backpointers_to_extents(): extents do not fit in ram, running in multiple passes with 11825 nodes per pass
check_backpointers_to_extents(): extents:POS_MIN-extents:402717470:12008:U32_MAX
WARNING at libbcachefs/btree_iter.c:2995: btree trans held srcu lock (delaying memory reclaim) for 22 seconds
trans should be locked, unlocked by 0x5555555e654dS
bcachefs: libbcachefs/btree_iter.c:1418: bch2_trans_unlocked_error: Assertion `0' failed.

Thread 1 "bcachefs" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44     ./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7a4526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7a288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7a2881b in __assert_fail_base (fmt=0x7ffff7bd01e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x5555557ac692 "0", file=file@entry=0x5555557ad096 "libbcachefs/btree_iter.c",
    line=line@entry=1418,
    function=function@entry=0x5555557b7150 <__PRETTY_FUNCTION__.12> "bch2_trans_unlocked_error")
    at ./assert/assert.c:94
#6  0x00007ffff7a3b507 in __assert_fail (assertion=assertion@entry=0x5555557ac692 "0",
    file=file@entry=0x5555557ad096 "libbcachefs/btree_iter.c", line=line@entry=1418,
    function=function@entry=0x5555557b7150 <__PRETTY_FUNCTION__.12> "bch2_trans_unlocked_error")
    at ./assert/assert.c:103
#7  0x00005555555fad5f in bch2_trans_unlocked_error (trans=trans@entry=0x5555584a1000)
    at libbcachefs/btree_iter.c:1418
#8  0x00005555555fde38 in bch2_trans_verify_not_unlocked (trans=0x5555584a1000) at libbcachefs/btree_iter.h:322
#9  bch2_btree_path_traverse (flags=<optimized out>, path=<optimized out>, trans=0x5555584a1000)
    at libbcachefs/btree_iter.h:224
#10 bch2_btree_iter_peek_node (iter=iter@entry=0x7fffffffd6b0) at libbcachefs/btree_iter.c:1876
#11 0x0000555555601755 in bch2_btree_iter_peek_node_and_restart (iter=iter@entry=0x7fffffffd6b0)
    at libbcachefs/btree_iter.c:1908
#12 0x00005555555d5222 in bch2_get_btree_in_memory_pos (trans=trans@entry=0x5555584a1000,
    btree_leaf_mask=btree_leaf_mask@entry=129, btree_interior_mask=btree_interior_mask@entry=18446744073709551615,
    start=..., end=end@entry=0x7fffffffd830) at libbcachefs/backpointers.c:797
#13 0x00005555555db484 in bch2_check_backpointers_to_extents (c=0x7ffff7cfb000) at libbcachefs/backpointers.c:971
#14 0x0000555555696025 in bch2_run_recovery_pass (c=c@entry=0x7ffff7cfb000,
    pass=pass@entry=BCH_RECOVERY_PASS_check_backpointers_to_extents) at libbcachefs/recovery_passes.c:183
#15 0x000055555569649d in bch2_run_recovery_passes (c=c@entry=0x7ffff7cfb000) at libbcachefs/recovery_passes.c:230
#16 0x000055555569483a in bch2_fs_recovery (c=0x7ffff7cfb000) at libbcachefs/recovery.c:852
#17 0x00005555556b88dd in bch2_fs_start (c=c@entry=0x7ffff7cfb000) at libbcachefs/super.c:1036
#18 0x00005555556bc4ad in bch2_fs_open (devices=devices@entry=0x5555559508a0, nr_devices=nr_devices@entry=4,
    opts=...) at libbcachefs/super.c:2137
#19 0x00005555555b8722 in cmd_fsck (argc=<optimized out>, argv=<optimized out>) at c_src/cmd_fsck.c:271
#20 0x00005555555ae701 in bcachefs::main ()
#21 0x00005555555acee3 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#22 0x00005555555a3609 in std::rt::lang_start::{{closure}} ()
#23 0x000055555576db34 in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (self=..., args=<optimized out>)
    at library/core/src/ops/function.rs:284
#24 std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (data=<optimized out>) at library/std/src/panicking.rs:552
#25 std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (f=...) at library/std/src/panicking.rs:516
#26 std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (f=...) at library/std/src/panic.rs:142
#27 std::rt::lang_start_internal::{closure#2} () at library/std/src/rt.rs:148
--Type <RET> for more, q to quit, c to continue without paging--
#28 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (data=<optimized out>)
    at library/std/src/panicking.rs:552
#29 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> (f=...)
    at library/std/src/panicking.rs:516
#30 0x000055555575aeab in std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (f=...)
    at library/std/src/panic.rs:142
#31 std::rt::lang_start_internal (main=..., argc=<optimized out>, argv=<optimized out>, sigpipe=<optimized out>)
    at library/std/src/rt.rs:148
#32 0x00005555555a35fe in std::rt::lang_start ()
#33 0x00007ffff7a2a1ca in __libc_start_call_main (main=main@entry=0x5555555ae830 <main>, argc=argc@entry=7,
    argv=argv@entry=0x7fffffffe498) at ../sysdeps/nptl/libc_start_call_main.h:58
#34 0x00007ffff7a2a28b in __libc_start_main_impl (main=0x5555555ae830 <main>, argc=7, argv=0x7fffffffe498,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe488)
    at ../csu/libc-start.c:360
#35 0x000055555557cdb5 in _start ()