Open Snogard opened 5 months ago
That looks like it's stuck trying to allocate - are your disks all full?
You can check dev-*/alloc_debug in sysfs - while it's trying to mount bcachefs fs usage won't work, but sysfs is accessible
before starting the whole process i remember about 850gb being free out of 13tb (reported by dolphin), so about half per disk ideally. The last disk i added (hdd3) was empty and never got to be fully used since evacaute hanged and i never got the chance to launch a rereplicate job.
Anyway, here are the logs:
/sys/fs/bcachefs/d6020e9b-770a-4aa5-a4af-389f4003b650/dev-0/alloc_debug:
buckets sectors fragmented
free 1895418 0 0
sb 7 6152 1016
journal 8192 8388608 0
btree 186693 111556608 79617024
user 13145427 13321947122 138970126
cached 26033 13916696 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 476958
normal 238493
copygc 28
btree 14
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1
open buckets this dev 0
open buckets total 1024
open_buckets_wait empty
open_buckets_btree 0
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
/sys/fs/bcachefs/d6020e9b-770a-4aa5-a4af-389f4003b650/dev-1/alloc_debug:
buckets sectors fragmented
free 1896865 0 0
sb 7 6152 1016
journal 8192 8388608 0
btree 185251 110264832 79432192
user 13145428 13321947546 138970726
cached 26027 13927360 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 476958
normal 238493
copygc 28
btree 14
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1
open buckets this dev 0
open buckets total 1024
open_buckets_wait empty
open_buckets_btree 0
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
/sys/fs/bcachefs/d6020e9b-770a-4aa5-a4af-389f4003b650/dev-2/alloc_debug:
buckets sectors fragmented
free 7608453 0 0
sb 4 6152 2040
journal 8192 16777216 0
btree 1110 1783296 489984
user 13126 26880424 1624
cached 0 0 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 238478
normal 119246
copygc 14
btree 7
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1
open buckets this dev 0
open buckets total 1024
open_buckets_wait empty
open_buckets_btree 0
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
Can you faddr2line the dmesg output?
It's my first time doing something like this and after a lot of hours i think i need some guidance on this.
I did recompile the kernel without stripping, i used the arch linux's pkgbuild, i commented out the "stripping build tools" lines and "stripping vm linux" lines.
After a reboot i tried to mount the pool to get a fresh dmesg from the new kernel and i used that for faddr2line:
/usr/lib/modules/6.7.3-zen1-2-zen/build/scripts/faddr2line /usr/lib/modules/6.7.3-zen1-2-zen/build/vmlinux ....
The only problem is that i can't seem to get a match for the bcachefs stuff, i did try to use faddress on the uncompressed bcachefs module found at /usr/lib/modules/6.7.3-zen1-2-zen/kernel/fs/bcachefs/bcachefs.ko.zst
but it still gives me this error: ERROR: CONFIG_DEBUG_INFO not enabled
I feel a bit lost.
file log for bcachefs.ko:
bcachefs.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=bc7df9d851182d3b23b2da70a497331d07bf91c0, not stripped
After a night of sleep i recompiled the kernel without changing anything and... faddr2line worked on the bcachefs module.
here is the complete version of the addressed dmesg, i hope i did it right.
any news?
Hi @Snogard and @koverstreet , I am facing identical issues.
Setup:
world/linux-zen 6.7.5.zen1-1
extra/bcachefs-tools 3:1.6.3-1
bcachefs format --compression=zstd --encrypted --label=ssd.ssd1 /dev/nvme0n1
and analogously with the HDDs, then added the SSD after mouting HDDs to /mnt
What happened?
How I tried to recover the data:
bcachefs mount /dev/sda:/dev/sdb:/dev/nvme1n1 /mnt -o fsck,fix_errors
Outcome running dmesg -k -H -L -w
:
[Feb22 18:26] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): mounting version 1.6: (unknown version) opts=metadata_replicas=2,data_replicas=2,compression=zstd,metadata_target=/dev/sda,foreground_target=ssd,background_target=bg_group,promote_target=ssd,fsck,fix_errors=yes
[ +0.000006] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): recovering from unclean shutdown
[ +0.000002] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): superblock requires following recovery passes to be run:
check_subvols,check_dirents
[ +0.000001] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version downgrade required:
[ +0.000004] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version upgrade from 1.3: rebalance_work to 1.6: (unknown version) incomplete
Doing compatible version upgrade from 1.3: rebalance_work to 1.6: (unknown version)
[Feb22 18:27] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal read done, replaying entries 3449328-3450622
[ +0.983910] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): alloc_read... done
[ +0.455294] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): stripes_read... done
[ +0.000010] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): snapshots_read... done
[ +0.000087] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): check_allocations... done
[Feb22 18:59] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal_replay...
[ +0.157422] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): going read-write
[Feb22 19:04] INFO: task kworker/u4:3:68 blocked for more than 122 seconds.
[ +0.000032] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000021] task:kworker/u4:3 state:D stack:0 pid:68 tgid:68 ppid:2 flags:0x00004000
[ +0.000008] Workqueue: btree_update btree_interior_update_work [bcachefs]
[ +0.000131] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] __schedule+0xcaa/0x1950
[ +0.000010] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000123] schedule+0x32/0xd0
[ +0.000006] __closure_sync+0x82/0x160
[ +0.000005] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000118] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000008] ? __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000112] __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000135] ? run_btree_triggers+0x35d/0x3d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000111] __bch2_trans_commit+0x1448/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000109] btree_interior_update_work+0x98d/0xaf0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000109] process_one_work+0x178/0x340
[ +0.000006] worker_thread+0x301/0x490
[ +0.000005] ? __pfx_worker_thread+0x10/0x10
[ +0.000003] kthread+0xe5/0x120
[ +0.000006] ? __pfx_kthread+0x10/0x10
[ +0.000005] ret_from_fork+0x31/0x50
[ +0.000006] ? __pfx_kthread+0x10/0x10
[ +0.000004] ret_from_fork_asm+0x1b/0x30
[ +0.000009] </TASK>
[ +0.000012] INFO: task bcachefs:4852 blocked for more than 122 seconds.
[ +0.000021] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000019] task:bcachefs state:D stack:0 pid:4852 tgid:4852 ppid:1763 flags:0x00004002
[ +0.000006] Call Trace:
[ +0.000001] <TASK>
[ +0.000003] __schedule+0xcaa/0x1950
[ +0.000006] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000115] schedule+0x32/0xd0
[ +0.000005] __closure_sync+0x82/0x160
[ +0.000005] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000113] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000007] ? __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000106] __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000114] __bch2_trans_commit+0x6ab/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000565] ? bch2_journal_replay+0x516/0x5d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000246] bch2_journal_replay+0x2fc/0x5d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000241] bch2_fs_recovery+0x18ab/0x1be0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000418] ? print_mount_opts+0x4b6/0x630 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000128] bch2_fs_start+0x32f/0x3b0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000122] bch2_fs_open+0x1158/0x18d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000130] ? free_percpu+0x268/0x420
[ +0.000008] ? bch2_mount+0x4e5/0x720 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000129] bch2_mount+0x4e5/0x720 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000470] legacy_get_tree+0x28/0x50
[ +0.000066] vfs_get_tree+0x26/0xf0
[ +0.000004] path_mount+0x4c9/0xb80
[ +0.000007] __x64_sys_mount+0x11a/0x150
[ +0.000006] do_syscall_64+0x61/0xe0
[ +0.000008] ? __x64_sys_add_key+0x19c/0x240
[ +0.000005] ? syscall_exit_to_user_mode+0x2b/0x40
[ +0.000004] ? do_syscall_64+0x70/0xe0
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ +0.000006] RIP: 0033:0x77e1f7b72d2e
[ +0.000016] RSP: 002b:00007fffcd64dee8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ +0.000004] RAX: ffffffffffffffda RBX: 0000000000000098 RCX: 000077e1f7b72d2e
[ +0.000004] RDX: 0000601af90af510 RSI: 0000601af90af970 RDI: 0000601af90b0110
[ +0.000002] RBP: 0000601af90b0110 R08: 0000601af90b1c60 R09: 000077e1f7c41ac0
[ +0.000002] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ +0.000002] R13: 0000000000000099 R14: 0000601af90af510 R15: 0000000000000004
[ +0.000022] </TASK>
[ +0.000005] INFO: task bch-reclaim/b1d:5095 blocked for more than 122 seconds.
[ +0.000027] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000020] task:bch-reclaim/b1d state:D stack:0 pid:5095 tgid:5095 ppid:2 flags:0x00004000
[ +0.000006] Call Trace:
[ +0.000017] <TASK>
[ +0.000003] __schedule+0xcaa/0x1950
[ +0.000007] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000107] schedule+0x32/0xd0
[ +0.000004] __closure_sync+0x82/0x160
[ +0.000005] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000091] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000006] ? bch2_btree_split_leaf+0x46/0x310 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000087] bch2_btree_split_leaf+0x46/0x310 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000811] ? btree_key_cache_flush_pos+0x435/0x4c0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000056] bch2_trans_commit_error+0x6c/0x640 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000053] __bch2_trans_commit+0xd42/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000053] btree_key_cache_flush_pos+0x435/0x4c0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000060] bch2_btree_key_cache_journal_flush+0x1a5/0x240 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000564] ? bch2_btree_key_cache_journal_flush+0x9d/0x240 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000052] journal_flush_pins.constprop.0+0x162/0x430 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000071] ? __pfx_bch2_btree_key_cache_journal_flush+0x10/0x10 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000052] bch2_journal_reclaim_thread+0x3b9/0x5e0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000066] ? __pfx_bch2_journal_reclaim_thread+0x10/0x10 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000064] kthread+0xe5/0x120
[ +0.000003] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork+0x31/0x50
[ +0.000004] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork_asm+0x1b/0x30
[ +0.000008] </TASK>
[Feb22 19:06] INFO: task kworker/u4:3:68 blocked for more than 245 seconds.
[ +0.000033] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000021] task:kworker/u4:3 state:D stack:0 pid:68 tgid:68 ppid:2 flags:0x00004000
[ +0.000009] Workqueue: btree_update btree_interior_update_work [bcachefs]
[ +0.000134] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] __schedule+0xcaa/0x1950
[ +0.000010] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000143] schedule+0x32/0xd0
[ +0.000006] __closure_sync+0x82/0x160
[ +0.000006] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000119] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000009] ? __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000113] __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000111] ? run_btree_triggers+0x35d/0x3d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000108] __bch2_trans_commit+0x1448/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000108] btree_interior_update_work+0x98d/0xaf0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000666] process_one_work+0x178/0x340
[ +0.000009] worker_thread+0x301/0x490
[ +0.000005] ? __pfx_worker_thread+0x10/0x10
[ +0.000004] kthread+0xe5/0x120
[ +0.000005] ? __pfx_kthread+0x10/0x10
[ +0.000005] ret_from_fork+0x31/0x50
[ +0.000006] ? __pfx_kthread+0x10/0x10
[ +0.000005] ret_from_fork_asm+0x1b/0x30
[ +0.000008] </TASK>
[ +0.000013] INFO: task bcachefs:4852 blocked for more than 245 seconds.
[ +0.000024] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000020] task:bcachefs state:D stack:0 pid:4852 tgid:4852 ppid:1763 flags:0x00004002
[ +0.000007] Call Trace:
[ +0.000002] <TASK>
[ +0.000002] __schedule+0xcaa/0x1950
[ +0.000008] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000121] schedule+0x32/0xd0
[ +0.000005] __closure_sync+0x82/0x160
[ +0.000005] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000115] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000007] ? __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000107] __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000114] __bch2_trans_commit+0x6ab/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000117] ? bch2_journal_replay+0x516/0x5d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000135] bch2_journal_replay+0x2fc/0x5d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000132] bch2_fs_recovery+0x18ab/0x1be0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000128] ? print_mount_opts+0x4b6/0x630 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000140] bch2_fs_start+0x32f/0x3b0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000122] bch2_fs_open+0x1158/0x18d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000127] ? free_percpu+0x268/0x420
[ +0.000007] ? bch2_mount+0x4e5/0x720 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000461] bch2_mount+0x4e5/0x720 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000142] legacy_get_tree+0x28/0x50
[ +0.000007] vfs_get_tree+0x26/0xf0
[ +0.000005] path_mount+0x4c9/0xb80
[ +0.000007] __x64_sys_mount+0x11a/0x150
[ +0.000006] do_syscall_64+0x61/0xe0
[ +0.000007] ? __x64_sys_add_key+0x19c/0x240
[ +0.000006] ? syscall_exit_to_user_mode+0x2b/0x40
[ +0.000003] ? do_syscall_64+0x70/0xe0
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ +0.000006] RIP: 0033:0x77e1f7b72d2e
[ +0.000015] RSP: 002b:00007fffcd64dee8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ +0.000005] RAX: ffffffffffffffda RBX: 0000000000000098 RCX: 000077e1f7b72d2e
[ +0.000003] RDX: 0000601af90af510 RSI: 0000601af90af970 RDI: 0000601af90b0110
[ +0.000002] RBP: 0000601af90b0110 R08: 0000601af90b1c60 R09: 000077e1f7c41ac0
[ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ +0.000002] R13: 0000000000000099 R14: 0000601af90af510 R15: 0000000000000004
[ +0.000005] </TASK>
[ +0.000004] INFO: task bch-reclaim/b1d:5095 blocked for more than 245 seconds.
[ +0.000026] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000019] task:bch-reclaim/b1d state:D stack:0 pid:5095 tgid:5095 ppid:2 flags:0x00004000
[ +0.000006] Call Trace:
[ +0.000002] <TASK>
[ +0.000003] __schedule+0xcaa/0x1950
[ +0.000007] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000115] schedule+0x32/0xd0
[ +0.000005] __closure_sync+0x82/0x160
[ +0.000006] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000114] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000008] ? bch2_btree_split_leaf+0x46/0x310 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000114] bch2_btree_split_leaf+0x46/0x310 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000107] ? btree_key_cache_flush_pos+0x435/0x4c0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000105] bch2_trans_commit_error+0x6c/0x640 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000119] __bch2_trans_commit+0xd42/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000053] btree_key_cache_flush_pos+0x435/0x4c0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000054] bch2_btree_key_cache_journal_flush+0x1a5/0x240 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000050] ? bch2_btree_key_cache_journal_flush+0x9d/0x240 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.001020] journal_flush_pins.constprop.0+0x162/0x430 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000069] ? __pfx_bch2_btree_key_cache_journal_flush+0x10/0x10 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000052] bch2_journal_reclaim_thread+0x3b9/0x5e0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000066] ? __pfx_bch2_journal_reclaim_thread+0x10/0x10 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000064] kthread+0xe5/0x120
[ +0.000003] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork+0x31/0x50
[ +0.000004] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork_asm+0x1b/0x30
[ +0.000004] </TASK>
[Feb22 19:08] INFO: task kworker/u4:3:68 blocked for more than 368 seconds.
[ +0.000029] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000020] task:kworker/u4:3 state:D stack:0 pid:68 tgid:68 ppid:2 flags:0x00004000
[ +0.000008] Workqueue: btree_update btree_interior_update_work [bcachefs]
[ +0.000119] Call Trace:
[ +0.000002] <TASK>
[ +0.000003] __schedule+0xcaa/0x1950
[ +0.000009] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000106] schedule+0x32/0xd0
[ +0.000005] __closure_sync+0x82/0x160
[ +0.000006] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000104] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000007] ? __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000104] __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000104] ? run_btree_triggers+0x35d/0x3d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000115] __bch2_trans_commit+0x1448/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000104] btree_interior_update_work+0x98d/0xaf0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000107] process_one_work+0x178/0x340
[ +0.000006] worker_thread+0x301/0x490
[ +0.000004] ? __pfx_worker_thread+0x10/0x10
[ +0.000004] kthread+0xe5/0x120
[ +0.000004] ? __pfx_kthread+0x10/0x10
[ +0.000005] ret_from_fork+0x31/0x50
[ +0.000005] ? __pfx_kthread+0x10/0x10
[ +0.000004] ret_from_fork_asm+0x1b/0x30
[ +0.000009] </TASK>
[ +0.000012] INFO: task bcachefs:4852 blocked for more than 368 seconds.
[ +0.000019] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000017] task:bcachefs state:D stack:0 pid:4852 tgid:4852 ppid:1763 flags:0x00004002
[ +0.000005] Call Trace:
[ +0.000002] <TASK>
[ +0.000002] __schedule+0xcaa/0x1950
[ +0.000006] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000103] schedule+0x32/0xd0
[ +0.000005] __closure_sync+0x82/0x160
[ +0.000004] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000100] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000006] ? __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000094] __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000101] __bch2_trans_commit+0x6ab/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000095] ? bch2_journal_replay+0x516/0x5d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.001951] bch2_journal_replay+0x2fc/0x5d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000112] bch2_fs_recovery+0x18ab/0x1be0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000167] ? print_mount_opts+0x4b6/0x630 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000101] bch2_fs_start+0x32f/0x3b0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000097] bch2_fs_open+0x1158/0x18d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000102] ? free_percpu+0x268/0x420
[ +0.000005] ? bch2_mount+0x4e5/0x720 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000102] bch2_mount+0x4e5/0x720 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000107] legacy_get_tree+0x28/0x50
[ +0.000005] vfs_get_tree+0x26/0xf0
[ +0.000004] path_mount+0x4c9/0xb80
[ +0.000005] __x64_sys_mount+0x11a/0x150
[ +0.000005] do_syscall_64+0x61/0xe0
[ +0.000006] ? __x64_sys_add_key+0x19c/0x240
[ +0.000005] ? syscall_exit_to_user_mode+0x2b/0x40
[ +0.000002] ? do_syscall_64+0x70/0xe0
[ +0.000004] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ +0.000005] RIP: 0033:0x77e1f7b72d2e
[ +0.000013] RSP: 002b:00007fffcd64dee8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ +0.000004] RAX: ffffffffffffffda RBX: 0000000000000098 RCX: 000077e1f7b72d2e
[ +0.000003] RDX: 0000601af90af510 RSI: 0000601af90af970 RDI: 0000601af90b0110
[ +0.000001] RBP: 0000601af90b0110 R08: 0000601af90b1c60 R09: 000077e1f7c41ac0
[ +0.000002] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ +0.000002] R13: 0000000000000099 R14: 0000601af90af510 R15: 0000000000000004
[ +0.000004] </TASK>
[ +0.000004] INFO: task bch-reclaim/b1d:5095 blocked for more than 368 seconds.
[ +0.000020] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000016] task:bch-reclaim/b1d state:D stack:0 pid:5095 tgid:5095 ppid:2 flags:0x00004000
[ +0.000005] Call Trace:
[ +0.000001] <TASK>
[ +0.000002] __schedule+0xcaa/0x1950
[ +0.000006] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000091] schedule+0x32/0xd0
[ +0.000004] __closure_sync+0x82/0x160
[ +0.000004] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000093] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000005] ? bch2_btree_split_leaf+0x46/0x310 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.001327] bch2_btree_split_leaf+0x46/0x310 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000189] ? btree_key_cache_flush_pos+0x435/0x4c0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000107] bch2_trans_commit_error+0x6c/0x640 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000097] __bch2_trans_commit+0xd42/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000108] btree_key_cache_flush_pos+0x435/0x4c0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000089] bch2_btree_key_cache_journal_flush+0x1a5/0x240 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000083] ? bch2_btree_key_cache_journal_flush+0x9d/0x240 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000091] journal_flush_pins.constprop.0+0x162/0x430 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000111] ? __pfx_bch2_btree_key_cache_journal_flush+0x10/0x10 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000093] bch2_journal_reclaim_thread+0x3b9/0x5e0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000113] ? __pfx_bch2_journal_reclaim_thread+0x10/0x10 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000113] kthread+0xe5/0x120
[ +0.000005] ? __pfx_kthread+0x10/0x10
[ +0.000004] ret_from_fork+0x31/0x50
[ +0.000005] ? __pfx_kthread+0x10/0x10
[ +0.000004] ret_from_fork_asm+0x1b/0x30
[ +0.000007] </TASK>
[Feb22 19:10] INFO: task kworker/u4:3:68 blocked for more than 491 seconds.
[ +0.000029] Not tainted 6.7.5-zen1-1-zen #1
[ +0.000016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000021] task:kworker/u4:3 state:D stack:0 pid:68 tgid:68 ppid:2 flags:0x00004000
[ +0.000009] Workqueue: btree_update btree_interior_update_work [bcachefs]
[ +0.000132] Call Trace:
[ +0.000001] <TASK>
[ +0.000004] __schedule+0xcaa/0x1950
[ +0.000010] ? __bch2_time_stats_update+0x129/0x270 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000118] schedule+0x32/0xd0
[ +0.000006] __closure_sync+0x82/0x160
[ +0.000007] bch2_btree_update_start+0x917/0x940 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000120] ? __pfx_closure_sync_fn+0x10/0x10
[ +0.000009] ? __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000137] __bch2_foreground_maybe_merge+0x555/0xe20 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.001357] ? run_btree_triggers+0x35d/0x3d0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000116] __bch2_trans_commit+0x1448/0x2050 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000111] btree_interior_update_work+0x98d/0xaf0 [bcachefs 826ed42b57730d51a6d0f37dd259ac20d3fa83c0]
[ +0.000109] process_one_work+0x178/0x340
[ +0.000006] worker_thread+0x301/0x490
[ +0.000005] ? __pfx_worker_thread+0x10/0x10
[ +0.000004] kthread+0xe5/0x120
[ +0.000005] ? __pfx_kthread+0x10/0x10
[ +0.000005] ret_from_fork+0x31/0x50
[ +0.000005] ? __pfx_kthread+0x10/0x10
[ +0.000005] ret_from_fork_asm+0x1b/0x30
[ +0.000009] </TASK>
[ +0.000002] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
very_degraded
as listed in https://bcachefs.org/bcachefs-principles-of-operation.pdf chapter 3.2, it ends like this in the same dmesg
command from above:
[Feb22 19:31] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): mounting version 1.6: (unknown version) opts=metadata_replicas=2,data_replicas=2,compression=zstd,metadata_target=/dev/sda,foreground_target=ssd,background_target=bg_group,promote_target=ssd,very_degraded,fsck
[ +0.000005] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): recovering from unclean shutdown
[ +0.000003] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): superblock requires following recovery passes to be run:
check_subvols,check_dirents
[ +0.000001] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version downgrade required:
[ +0.000003] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version upgrade from 1.3: rebalance_work to 1.6: (unknown version) incomplete
Doing compatible version upgrade from 1.3: rebalance_work to 1.6: (unknown version)
[Feb22 19:33] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal read done, replaying entries 3449330-3450899
[ +1.051386] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): alloc_read... done
[ +0.396307] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): stripes_read... done
[ +0.000009] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): snapshots_read... done
[ +0.000086] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): check_allocations... done
[Feb22 20:03] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal_replay...
[ +0.166803] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): going read-write
[Feb22 20:04] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error validating btree node on sda at btree backpointers level 0/2
[ +0.000003] u64s 11 type btree_ptr_v2 0:265300484096:0 len 0 ver 0: seq 7e21ead542dd913b written 384 min_key 0:264819048448:1 durability: 1 ptr: 0:253345:0 gen 0 stale
[ +0.000001] node offset 0: got wrong btree node (seq 8e9736c0b4e87d22 want 7e21ead542dd913b)
[ +0.000074] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): running explicit recovery pass check_topology (4), currently at journal_replay (9)
[ +0.000042] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): btree_update_nodes_written(): error EIO
[ +0.000027] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): fatal error - emergency read only
[ +1.230558] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal replay: error while replaying key at btree backpointers level 0: EIO
[ +0.001166] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_journal_replay(): error EIO
[ +0.000026] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_fs_recovery(): error EIO
[ +0.000020] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_fs_start(): error starting filesystem EIO
ERROR - bcachefs::commands::cmd_mount: Fatal error: Input/output error
@lorenzpmeier you may have to faddr2line the dmesg output like i did.
Hi @Snogard, I get the same error you got, ERROR: CONFIG_DEBUG_INFO not enabled
when running
/usr/lib/modules/6.7.5-zen1-1-zen/build/scripts/faddr2line /usr/lib/modules/6.7.5-zen1-1-zen/build/vmlinux /usr/lib/modules/6.7.5-zen1-1-zen/kernel/fs/bcachefs/bcachefs.ko.zst
from root. Swapping kernel to 6.7.4-artix1-1
doesn't change the error.
What am I missing?
You must recompile your kernel disabling the stripping from the pkgbuild (from line 187 to 203) and set INSTALL_MOD_STRIP=0.
Then for every line run something like this: /path/to/faddr2line /path/to/module.ko __schedule+0xcaa/0x1950
Thanks. I checked via cat /proc/config.gz | gunzip | grep CONFIG_DEBUG_INFO
and got CONFIG_DEBUG_INFO=y
which looked sufficient.
I will
git clone https://gitea.artixlinux.org/packages/linux-zen/ --depth 1
, today's commit is 0d00bfa6112c02037c840e95b6ed0ecdf9e3380f
INSTALL_MOD_SCRIPT
to =0
makepkg --install --skippgpcheck
in the directory.I'll report back tomorrow.
Sorry for taking so long to look at this; looking at the backtraces, we're probably blocked on the allocator.
Can you check /sys/fs/bcachefs/*/dev-0/alloc_debug?
/sys/fs/bcachefs/*/dev-0/alloc_debug
output:
free 0 0 0
sb 0 0 0
journal 0 0 0
btree 0 0 0
user 0 0 0
cached 0 0 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 1192156
normal 596092
copygc 28
btree 14
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1
open buckets this dev 0
open buckets total 1024
open_buckets_wait empty
open_buckets_btree 0
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
Linux build failed, OOM, I'll swap server and rebuild on a bigger machine. I'll report back ASAP.
@koverstreet Thanks in advance!
Update 2024-02-23 11:17 UTC+0:
On building Linux as mentioned above, following issue arises @Snogard. I ran BUILDDIR=/home/lorenz/linux-zen/BUILDDIR makepkg --skippgpcheck
==> Entering fakeroot environment...
==> Starting package_linux-zen()...
Installing boot image...
Installing modules...
SYMLINK /home/lorenz/linux-zen/BUILDDIR/linux-zen/pkg/linux-zen/usr/lib/modules/6.7.5-zen1-1-zen/build
INSTALL /home/lorenz/linux-zen/BUILDDIR/linux-zen/pkg/linux-zen/usr/lib/modules/6.7.5-zen1-1-zen/modules.order
INSTALL /home/lorenz/linux-zen/BUILDDIR/linux-zen/pkg/linux-zen/usr/lib/modules/6.7.5-zen1-1-zen/modules.builtin
INSTALL /home/lorenz/linux-zen/BUILDDIR/linux-zen/pkg/linux-zen/usr/lib/modules/6.7.5-zen1-1-zen/modules.builtin.modinfo
INSTALL /home/lorenz/linux-zen/BUILDDIR/linux-zen/pkg/linux-zen/usr/lib/modules/6.7.5-zen1-1-zen/kernel/arch/x86/events/amd/power.ko
STRIP /home/lorenz/linux-zen/BUILDDIR/linux-zen/pkg/linux-zen/usr/lib/modules/6.7.5-zen1-1-zen/kernel/arch/x86/events/amd/power.ko
strip: '0': No such file
make[2]: *** [scripts/Makefile.modinst:120: /home/lorenz/linux-zen/BUILDDIR/linux-zen/pkg/linux-zen/usr/lib/modules/6.7.5-zen1-1-zen/kernel/arch/x86/events/amd/power.ko] Error 1
make[2]: *** Deleting file '/home/lorenz/linux-zen/BUILDDIR/linux-zen/pkg/linux-zen/usr/lib/modules/6.7.5-zen1-1-zen/kernel/arch/x86/events/amd/power.ko'
make[1]: *** [/home/lorenz/linux-zen/BUILDDIR/linux-zen/src/linux-6.7.5/Makefile:1817: modules_install] Error 2
make: *** [Makefile:234: __sub-make] Error 2
==> ERROR: A failure occurred in package_linux-zen().
Aborting...
While I am banging my head against compiling linux to get richer pointers, I stumbled across some interesting new errors.
After unlocking my drives, I tried to run fsck
to reconstruct the alloc tree, i.e. bcachefs fsck -vnR /dev/sda:/dev/sdb:/dev/nvme0n1
which leads to:
bcachefs fsck -nrv /dev/sda:/dev/sdb:/dev/nvme0n1
bch2_dev_in_fs() Split brain detected between /dev/nvme0n1 and /dev/sda:
/dev/sda believes seq of /dev/nvme0n1 to be 65, but /dev/nvme0n1 has 94
Not using /dev/nvme0n1
bch2_dev_in_fs() Split brain detected between /dev/sdb and /dev/sda:
/dev/sda believes seq of /dev/sdb to be 65, but /dev/sdb has 97
Not using /dev/sdb
bch2_fs_open() bch_fs_open err opening /dev/sda: insufficient_devices_to_start
bch2_dev_in_fs() Split brain detected between /dev/nvme0n1 and /dev/sda:
/dev/sda believes seq of /dev/nvme0n1 to be 65, but /dev/nvme0n1 has 94
Not using /dev/nvme0n1
bch2_dev_in_fs() Split brain detected between /dev/sdb and /dev/sda:
/dev/sda believes seq of /dev/sdb to be 65, but /dev/sdb has 97
Not using /dev/sdb
insufficient devices online (0) for replicas entry user: 1/2 [1 2]
bch2_fs_open() bch_fs_open err opening /dev/sda: insufficient_devices_to_start
shutting down
shutdown complete
So I thought, hey, might as well use the -k
switch to Use the in-kernel fsck implementation
, which leads to
bcachefs fsck -kvnR /dev/sda:/dev/sdb:/dev/nvme0n1
and the output
BCH_IOCTL_FSCK_OFFLINE error: Operation not permitted
dmesg
stays empty.
@koverstreet how FUBAR is my situation?
lorenzpmeier - if you've still got this fs there's an option in my master branch, no_splitbrain_check, which will let you mount
Sorry it took so long to get to this!
Hi Kent,
I ran bcachefs mount /dev/sdc:/dev/sdb:/dev/nvme0n1 /mnt -o ro,fsck,no_splitbrain_check
but it did not work. Following error
INFO - bcachefs::key: Attempting to unlock master key for filesystem b1df1cb0-af7f-4ab0-8b11-18d22e514108, using unlock policy Ask
Enter passphrase:
INFO - bcachefs::commands::cmd_mount: mounting with params: device: /dev/sdc:/dev/sdb:/dev/nvme0n1, target: /mnt, verbose,ro,fsck,no_splitbrain_check
DEBUG - bcachefs::commands::cmd_mount: parsing mount options: verbose,ro,fsck,no_splitbrain_check
INFO - bcachefs::commands::cmd_mount: mounting bcachefs filesystem, /mnt
INFO - bcachefs::commands::cmd_mount: mounting filesystem
ERROR - bcachefs::commands::cmd_mount: Fatal error: Invalid argument
and got the following dmesg:
[Mar12 00:23] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): mounting version 1.6: (unknown version) opts=ro,metadata_replicas=2,data_replicas=2,compression=zstd,metadata_target=/dev/sdb,foreground_target=ssd,background_target=bg_group,promote_target=ssd,fsck
[ +0.000008] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): recovering from unclean shutdown
[ +0.000005] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): superblock requires following recovery passes to be run:
check_subvols,check_dirents
[ +0.000003] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version downgrade required:
[ +0.000006] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version upgrade from 1.3: rebalance_work to 1.6: (unknown version) incomplete
Doing compatible version upgrade from 1.3: rebalance_work to 1.6: (unknown version)
[Mar12 00:24] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal read done, replaying entries 3449362-3453196
[ +0.000007] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): dropped unflushed entries 3453197-3453197
[ +1.400036] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): alloc_read... done
[ +0.473012] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): stripes_read... done
[ +0.000010] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): snapshots_read... done
[ +0.000088] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): check_allocations...
[Mar12 00:52] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error validating btree node on sdc at btree backpointers level 0/2
[ +0.000003] u64s 12 type btree_ptr_v2 2:3835760607232:0 len 0 ver 0: seq f63e0ab2ba2c7878 written 72 min_key 2:3673709256704:1 durability: 2 ptr: 0:199660:512 gen 4 ptr: 1:10708637:512 gen 9
[ +0.000002] node offset 0: got wrong btree node (seq ceda499762806338 want f63e0ab2ba2c7878)
[ +0.000098] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): retrying read
[ +0.009070] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error validating btree node on sdb at btree backpointers level 0/2
[ +0.000004] u64s 12 type btree_ptr_v2 2:3835760607232:0 len 0 ver 0: seq f63e0ab2ba2c7878 written 72 min_key 2:3673709256704:1 durability: 2 ptr: 0:199660:512 gen 4 ptr: 1:10708637:512 gen 9
[ +0.000001] node offset 0: got wrong btree node (seq ceda499762806338 want f63e0ab2ba2c7878)
[ +0.003844] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): running explicit recovery pass check_topology (4), currently at check_allocations (5)
[ +0.000008] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): retry success
[ +0.000039] Unreadable btree node at btree backpointers level 0:
[ +0.000002] u64s 12 type btree_ptr_v2 2:3835760607232:0 len 0 ver 0: seq f63e0ab2ba2c7878 written 72 min_key 2:3673709256704:1 durability: 2 ptr: 0:199660:512 gen 4 ptr: 1:10708637:512 gen 9, exiting
[ +0.003412] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Unable to continue, halting
[ +0.001094] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_gc_btree_init(): error fsck_errors_not_fixed
[ +0.001097] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_gc_btrees(): error fsck_errors_not_fixed
[ +0.047758] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_gc(): error fsck_errors_not_fixed
[ +0.001401] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_fs_recovery(): error fsck_errors_not_fixed
[ +0.000589] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_fs_start(): error starting filesystem fsck_errors_not_fixed
The Fatal error: Invalid argument
might hint towards something missing in my version, bcachefs version
shows 1.6.4
.
You need to also include the fix_errors option
Thanks!
Thus I ran bcachefs mount /dev/sdc:/dev/sdb:/dev/nvme0n1 /mnt -o ro,fsck,no_splitbrain_check,fix_errors
, leading to
INFO - bcachefs::commands::cmd_mount: Successfully mounted
and a dmesg reporting
[Mar13 11:20] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): mounting version 1.6: (unknown version) opts=ro,metadata_replicas=2,data_replicas=2,compression=zstd,metadata_target=/dev/sdb,foreground_target=ssd,background_target=bg_group,promote_target=ssd,fsck,fix_errors=yes
[ +0.000005] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): recovering from unclean shutdown
[ +0.000003] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): superblock requires following recovery passes to be run:
check_subvols,check_dirents
[ +0.000001] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version downgrade required:
[ +0.000004] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version upgrade from 1.3: rebalance_work to 1.6: (unknown version) incomplete
Doing compatible version upgrade from 1.3: rebalance_work to 1.6: (unknown version)
[Mar13 11:21] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal read done, replaying entries 3449362-3453196
[ +0.000004] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): dropped unflushed entries 3453197-3453197
[ +1.162372] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_journal_reclaim_start(): error creating journal reclaim thread EINTR
[ +0.000325] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_fs_recovery(): error EINTR
[ +0.000023] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_fs_start(): error starting filesystem EINTR
[Mar13 11:22] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): mounting version 1.6: (unknown version) opts=ro,metadata_replicas=2,data_replicas=2,compression=zstd,metadata_target=/dev/sdb,foreground_target=ssd,background_target=bg_group,promote_target=ssd,fsck,fix_errors=yes
[ +0.000005] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): recovering from unclean shutdown
[ +0.000002] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): superblock requires following recovery passes to be run:
check_subvols,check_dirents
[ +0.000002] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version downgrade required:
[ +0.000003] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Version upgrade from 1.3: rebalance_work to 1.6: (unknown version) incomplete
Doing compatible version upgrade from 1.3: rebalance_work to 1.6: (unknown version)
[Mar13 11:23] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal read done, replaying entries 3449362-3453196
[ +0.000007] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): dropped unflushed entries 3453197-3453197
[ +1.416716] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): alloc_read... done
[ +0.543449] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): stripes_read... done
[ +0.000010] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): snapshots_read... done
[ +0.000086] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): check_allocations...
[Mar13 11:53] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error validating btree node on sdb at btree backpointers level 0/2
[ +0.000003] u64s 12 type btree_ptr_v2 2:3835760607232:0 len 0 ver 0: seq f63e0ab2ba2c7878 written 72 min_key 2:3673709256704:1 durability: 2 ptr: 0:199660:512 gen 4 ptr: 1:10708637:512 gen 9
[ +0.000002] node offset 0: got wrong btree node (seq ceda499762806338 want f63e0ab2ba2c7878)
[ +0.000097] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): retrying read
[ +0.009994] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error validating btree node on sdc at btree backpointers level 0/2
[ +0.000003] u64s 12 type btree_ptr_v2 2:3835760607232:0 len 0 ver 0: seq f63e0ab2ba2c7878 written 72 min_key 2:3673709256704:1 durability: 2 ptr: 0:199660:512 gen 4 ptr: 1:10708637:512 gen 9
[ +0.000002] node offset 0: got wrong btree node (seq ceda499762806338 want f63e0ab2ba2c7878)
[ +0.000090] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): running explicit recovery pass check_topology (4), currently at check_allocations (5)
[ +0.000004] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): retry success
[ +0.000019] Unreadable btree node at btree backpointers level 0:
[ +0.000002] u64s 12 type btree_ptr_v2 2:3835760607232:0 len 0 ver 0: seq f63e0ab2ba2c7878 written 72 min_key 2:3673709256704:1 durability: 2 ptr: 0:199660:512 gen 4 ptr: 1:10708637:512 gen 9, fixing
[ +0.000052] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Halting mark and sweep to start topology repair pass
[Mar13 11:54] done
[ +0.000003] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): check_allocations...
[Mar13 12:22] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error validating btree node on sdc at btree backpointers level 0/2
[ +0.000003] u64s 12 type btree_ptr_v2 2:3835760607232:0 len 0 ver 0: seq f63e0ab2ba2c7878 written 72 min_key 2:3673709256704:1 durability: 2 ptr: 0:199660:512 gen 4 ptr: 1:10708637:512 gen 9
[ +0.000002] node offset 0: got wrong btree node (seq ceda499762806338 want f63e0ab2ba2c7878)
[ +0.000097] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): retrying read
[ +0.009523] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error validating btree node on sdb at btree backpointers level 0/2
[ +0.000004] u64s 12 type btree_ptr_v2 2:3835760607232:0 len 0 ver 0: seq f63e0ab2ba2c7878 written 72 min_key 2:3673709256704:1 durability: 2 ptr: 0:199660:512 gen 4 ptr: 1:10708637:512 gen 9
[ +0.000001] node offset 0: got wrong btree node (seq ceda499762806338 want f63e0ab2ba2c7878)
[ +0.000096] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): retry success
[ +0.000885] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): Halting mark and sweep to start topology repair pass
[Mar13 12:23] done
[ +0.145624] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): journal_replay...
[ +0.211785] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): going read-write
[ +0.015047] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_gc_thread_start(): error EINTR
[ +0.000431] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error starting gc thread
[ +0.000036] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_journal_replay(): error EINTR
[ +0.000017] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_fs_recovery(): error EINTR
[ +0.000011] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): bch2_fs_start(): error starting filesystem EINTR
[Mar13 14:59] bcachefs (b1df1cb0-af7f-4ab0-8b11-18d22e514108): error requesting encryption key: ENOKEY
The mount succeeded at 1755hrs, taking ~6h30m, the fsck,fix_errors,no_splitbrain_check
option worked, I am transferring files to a new drive as we speak.
If I get more dmesg output, I'll share it, but this seems to have solved at least the mounting and access issue.
Thanks @koverstreet ! Great work and thanks a bunch for your support.
Biggest learning for me: Don't put your backups only on old hard drives that die on replay.
@koverstreet do you think i should try too? or is my problem different?
@Snogard yours looks different.
When it hangs, grab /sys/fs/bcachefs/
I've got another filesystem to debug that's showing something similar, I'll work on that today as well
Ok, I've got a fix for deadlocks during journal replay in my master branch. Can whoever is still hitting that try and report back?
I think all bugs mentioned in this thread are fixed now; please reopen this or a new bug if needed.
Sorry @koverstreet for the late reply but i've been without internet for the past weeks. Anyway i compiled the kernel from your last commit and my filesystem still hangs.
Here is the alloc for all three devices:
github doesn't give me the option to reopen the issue, can you do it instead @koverstreet?
reedriley just confirmed - and I got a good look at what's going on, fix should be up in a day or so
Any progress on this?
Any progress on this?
Are you still having the issue with bcachefs and bcachefs-tools master? I was having this issue on my end until some update seemingly fixed the problem for now.
last time i checked was april 2, i was waiting for an update before testing again. anyway i'll try again as soon as possible and report back
Yep, still having problems... I don't have time now to use faddr2line right now, but in the meantime here is the dmesg. I'll try to post the results of faddr2line tomorrow at least.
Sorry for the delay, here it is.
Just one question, __entry_text_end
returned me ??:?
, is this normal? do i have something worng with my configs?
Sorry for the delay, here it is. addessed dmesg
Just one question,
__entry_text_end
returned me??:?
, is this normal? do i have something worng with my configs?
You can always search for the string in the Github repo:
https://github.com/search?q=repo%3Akoverstreet%2Fbcachefs+__entry_text_end&type=code
Basically, just looks like some assembly that the decoder doesn't know how to handle. Doesn't look relevant to anything, anyways.
Sorry for the delay, here it is. addessed dmesg Just one question,
__entry_text_end
returned me??:?
, is this normal? do i have something worng with my configs?You can always search for the string in the Github repo:
https://github.com/search?q=repo%3Akoverstreet%2Fbcachefs+__entry_text_end&type=code
Basically, just looks like some assembly that the decoder doesn't know how to handle. Doesn't look relevant to anything, anyways.
thanks for the clarification!
Any news on this?
Should i try again with that commit or shoud i wait a bit more?
OS: Archlinux Kernel: 6.7.3-zen1-1-zen
Context
I have two 8tb disks in the same pool with replicas=2, one of the disks needed to be replaced so i tried doing these steps:
problem is, the process hanged and i had to restart my pc. Next time, i tried mounting with fsck,fix_errors because some journal entries went missing, it seems to go fine until it goes read-write and hangs again.
After another system restart, the errors are still the same and i can't seem to mount the filesystem in any way.
i know bcachefs is still in beta, but i woud really like to recover my data if possible, thanks in advance for the help.
Originally i formatted the filesystem with this command:
Logs
bcachefs super:
kernel log: