Open inodentry opened 5 months ago
I am now experiencing the exact same issue after a power outage.
Can you grab /sys/fs/bcachefs/
Same issue here, same backtrace including fsck failing with error journal_reclaim_would_deadlock
.
~ % cat /sys/fs/bcachefs/8e4c4cd5-bfeb-41ff-9560-42c68f6461de/dev-0/alloc_debug
buckets sectors fragmented
free 1029752 0 0
sb 7 6152 1016
journal 8192 8388608 0
btree 15243 15603200 5632
user 33279446 34078288973 0
cached 0 0 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 1072922
normal 536475
copygc 28
btree 14
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1024
open buckets this dev 27
open buckets total 1024
open_buckets_wait waiting
open_buckets_btree 1023
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
% sudo bcachefs show-super /dev/dm-8
External UUID: 8e4c4cd5-bfeb-41ff-9560-42c68f6461de
Internal UUID: 15a62530-11c4-4ab8-900d-f257889e4a43
Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index: 6
Label:
Version: 1.6: btree_subvolume_children
Version upgrade complete: 1.3: rebalance_work
Oldest version on disk: 1.3: rebalance_work
Created: Sun Jan 28 17:59:06 2024
Sequence number: 288
Time of last write: Sat Mar 2 21:30:55 2024
Superblock size: 9.73 KiB/1.00 MiB
Clean: 0
Devices: 7
Sections: members_v1,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
Features: zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Options:
block_size: 4.00 KiB
btree_node_size: 256 KiB
errors: continue [ro] panic
metadata_replicas: 3
data_replicas: 3
metadata_replicas_required: 2
data_replicas_required: 2
encoded_extent_max: 64.0 KiB
metadata_checksum: none [crc32c] crc64 xxhash
data_checksum: none [crc32c] crc64 xxhash
compression: zstd:1
background_compression: zstd:15
str_hash: crc32c crc64 [siphash]
metadata_target: none
foreground_target: ssd
background_target: hdd
promote_target: ssd
erasure_code: 0
inodes_32bit: 1
shard_inode_numbers: 1
inodes_use_key_cache: 1
gc_reserve_percent: 8
gc_reserve_bytes: 0 B
root_reserve_percent: 0
wide_macs: 0
acl: 1
usrquota: 0
grpquota: 0
prjquota: 0
journal_flush_delay: 1000
journal_flush_disabled: 0
journal_reclaim_delay: 100
journal_transaction_names: 1
version_upgrade: [compatible] incompatible none
nocow: 0
members_v2 (size 912):
Device: 0
Label: hdd0 (1)
UUID: e969b88a-d108-495f-8b09-72d0c47c23b7
Size: 16.4 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 34332640
Last mount: Sat Mar 2 21:35:26 2024
Last superblock write: 285
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 1
Label: ssd0 (3)
UUID: 1018e3e7-971f-483e-9b51-add348accb77
Size: 3.64 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 7630863
Last mount: Sat Mar 2 21:35:26 2024
Last superblock write: 285
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Durability: 1
Discard: 1
Freespace initialized: 1
Device: 2
Label: ssd1 (4)
UUID: 24806e9d-3848-4a13-aedc-1f2f39f3f7d6
Size: 3.64 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 7630863
Last mount: Sat Mar 2 21:35:26 2024
Last superblock write: 285
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Durability: 1
Discard: 1
Freespace initialized: 1
Device: 3
Label: hdd1 (5)
UUID: 9dcccfd4-c8f5-4d0c-9025-790b577c1fed
Size: 12.7 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 13351920
Last mount: Sat Mar 2 21:35:26 2024
Last superblock write: 285
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 4
Label: hdd2 (6)
UUID: d8dd0ea2-196b-4266-8813-69ccc1abd52a
Size: 12.7 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 13351920
Last mount: Sat Mar 2 21:35:26 2024
Last superblock write: 285
State: rw
Data allowed: journal,btree,user
Has data: journal,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 5
Label: hdd3 (8)
UUID: fff75b25-9672-4635-b77d-fceaeaa4d2c2
Size: 12.7 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 13351920
Last mount: Sat Mar 2 21:35:26 2024
Last superblock write: 285
State: rw
Data allowed: journal,btree,user
Has data: journal,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 6
Label: hdd4 (7)
UUID: c4de381b-6805-49a9-bca2-0ce33e5aecc4
Size: 12.7 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 13351920
Last mount: Sat Mar 2 21:35:26 2024
Last superblock write: 285
State: rw
Data allowed: journal,btree,user
Has data: journal,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
errors (size 8):
I'd really like to help debug this issue, but the system this happens on is used for storing backups. I can leave it offline for a few days but not for week(s). @koverstreet Do you think you'll have the time to dig into this in the next days? Otherwise I'd leave the issue for later and format the disks :) Thank you :v:
While I don't have any important data on my system either (everything is stuff I could recreate by reinstalling the OS and replicating files from other machines), I'd really like to avoid that if possible, because it would be a hassle. Is there some way to get the filesystem working again, perhaps with some data loss (such as clearing/dropping the latest activity from the journal)?
I can +1 this issue. Identical symptoms. Upstream Kernel source 6.7.6, encryption and erasure coding used with 2 replicas, LZ4. Interestingly it only happened after a clean shutdown, yet the few un-clean shutdowns the system has had didn't cause anything a fsck couldn't fix.
Same issues after an unclean shutdown (my computer got powered off accidentally). I've been using BCacheFS for a week.
BCacheFS Tools 1.6.3
I use LZ4, replicas 2, 2 NVME, and 2 HDD. My mkfs is copied here:
sudo bcachefs format --compression=lz4 --replicas=2 --label=ssd.ssd1 /dev/nvme1n1 --label=ssd.ssd2 /dev/nvme2n1 --label=hdd.hdd1 /dev/sda --label=hdd.hdd2 /dev/sdb --foreground_target=ssd --promote_target=ssd --background_target=hdd
I've actually managed to make progress, by using kernel fsck as opposed to userspace fsck.
bcachefs mount -o fsck,fix_errors,verbose UUID=2353ad4f-f54a-4a6d-b838-596270f9eebc /mnt
It took a while, following along the progress in dmesg in another tty but in the end it actually mounted, and just... works fine.
root@archiso ~ # cat /sys/fs/bcachefs/0cda7e0c-e826-4f4b-a499-d93bb20ed966/dev-0/alloc_debug
buckets sectors fragmented
free 700035 0 0
sb 13 6152 504
journal 8192 4194304 0
btree 14430 7388160 0
user 64296 31376200 1543352
cached 1044478 535068584 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 57288
normal 28672
copygc 56
btree 28
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1024
open buckets this dev 511
open buckets total 1024
open_buckets_wait waiting
open_buckets_btree 1023
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
root@archiso ~ # cat /sys/fs/bcachefs/0cda7e0c-e826-4f4b-a499-d93bb20ed966/dev-1/alloc_debug
buckets sectors fragmented
free 699984 0 0
sb 13 6152 504
journal 8192 4194304 0
btree 14430 7388160 0
user 64231 31376200 1510072
cached 1044594 535067128 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 57288
normal 28672
copygc 56
btree 28
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1024
open buckets this dev 512
open buckets total 1024
open_buckets_wait waiting
open_buckets_btree 1023
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
root@archiso ~ # cat /sys/fs/bcachefs/0cda7e0c-e826-4f4b-a499-d93bb20ed966/dev-2/alloc_debug
buckets sectors fragmented
free 2761683 0 0
sb 13 6152 504
journal 8192 4194304 0
btree 0 0 0
user 1045588 535068904 272152
cached 2 336 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 119288
normal 59672
copygc 56
btree 28
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1024
open buckets this dev 0
open buckets total 1024
open_buckets_wait waiting
open_buckets_btree 1023
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
root@archiso ~ # cat /sys/fs/bcachefs/0cda7e0c-e826-4f4b-a499-d93bb20ed966/dev-3/alloc_debug
buckets sectors fragmented
free 2761659 0 0
sb 13 6152 504
journal 8192 4194304 0
btree 0 0 0
user 1045612 535066808 286536
cached 2 336 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 0 0 0
ec 0
reserves:
stripe 119288
normal 59672
copygc 56
btree 28
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 1024
open buckets this dev 0
open buckets total 1024
open_buckets_wait waiting
open_buckets_btree 1023
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
Oh and now I also see a suspicious looking message in dmesg
I didn't see before:
[ 487.089684] bcachefs (0cda7e0c-e826-4f4b-a499-d93bb20ed966): Version downgrade required:
[ 487.089693] bcachefs (0cda7e0c-e826-4f4b-a499-d93bb20ed966): Version upgrade from 1.3: rebalance_work to 1.4: (unknown version) incomplete
Doing compatible version upgrade from 1.3: rebalance_work to 1.4: (unknown version)
Sounds like there was some sort of upgrade that was interrupted/unfinished? Interesting, because I have never tried to mount this FS on a kernel newer than 6.7.6. Could the userspace fsck have done this?
I tried to run kernel fsck, but it did not help. The kernel task hangs just like in OP.
Ok, I tried to boot into a Fedora Rawhide ISO with kernel 6.8-rc7 to see if it would help. Looks like now I can't even try to do anything. Both dmesg and bcachefsck give this error:
bch2_dev_in_fs() Split brain detected between /dev/sdd1 and /dev/sda1:
/dev/sda1 believes seq of /dev/sdd1 to be 49, but /dev/sdd1 has 58
Not using /dev/sdd1
bch2_dev_in_fs() Split brain detected between /dev/sdc and /dev/sda1:
/dev/sda1 believes seq of /dev/sdc to be 49, but /dev/sdc has 58
Not using /dev/sdc
bch2_dev_in_fs() Split brain detected between /dev/sdb and /dev/sda1:
/dev/sda1 believes seq of /dev/sdb to be 49, but /dev/sdb has 58
Not using /dev/sdb
bch2_fs_open() bch_fs_open err opening /dev/sda1: insufficient_devices_to_start
I guess my previous attempt to run kernel fsck messed up the FS. I guess this FS is actually gone now ... oh well.
So much for "the filesystem that won't eat your data" ;)
Ok, I tried to boot into a Fedora Rawhide ISO with kernel 6.8-rc7 to see if it would help. Looks like now I can't even try to do anything. Both dmesg and bcachefsck give this error:
bch2_dev_in_fs() Split brain detected between /dev/sdd1 and /dev/sda1: /dev/sda1 believes seq of /dev/sdd1 to be 49, but /dev/sdd1 has 58 Not using /dev/sdd1 bch2_dev_in_fs() Split brain detected between /dev/sdc and /dev/sda1: /dev/sda1 believes seq of /dev/sdc to be 49, but /dev/sdc has 58 Not using /dev/sdc bch2_dev_in_fs() Split brain detected between /dev/sdb and /dev/sda1: /dev/sda1 believes seq of /dev/sdb to be 49, but /dev/sdb has 58 Not using /dev/sdb bch2_fs_open() bch_fs_open err opening /dev/sda1: insufficient_devices_to_start
I guess my previous attempt to run kernel fsck messed up the FS. I guess this FS is actually gone now ... oh well.
So much for "the filesystem that won't eat your data" ;)
I actually had this issue myself when trying to userspace fsck before I got it mounting. I'm not actually sure what caused it, but in my case the filesystem was still intact, and having checked everything since then I've suffered zero data loss or corruption. I'm not actually sure what happened or why kernel fsck fixed it for me in the end, but I wouldn't throw all hope away yet. This is still early days for bcachefs, and the tools are still immature, but the on-disk format in my experience is very rock solid. Just may take some effort to be able to exploit that better.
Yeah I was half-joking. :) I know it's early days and the FS is very experimental and buggy. Don't get me wrong, I love bcachefs, I love its feature set and flexibility, and I agree the on-disk format is great (in theory at least). I genuinely think bcachefs is the best COW filesystem design in existence. But, on the other hand ... it did eat my data! π I am making fun of the marketing slogan, because it is cheeky and kinda toxic.
I am not giving up on bcachefs. I am very hopeful for the future and can't wait for it to improve and mature. That's why I am here filing bug reports with all the data I can collect.
But no, kernel fsck did not fix it for me. The same errors appear with both userspace and kernel fsck. As a last resort, I tried it with very_degraded
and devices missing, and it just deadlocked on journal replay again.
Have you tried using -o nochanges
? If the problem is journal replay then what happens when it's not allowed to replay the journal?
Just tried -o ro,nochanges,very_degraded
with one device at a time (to avoid the "split brain" error). Same "journal would deadlock" error. Another attempt completely froze the kernel ...
Yea I don't think there is hope. I doubt there is any use in keeping this FS around either, there is probably no more useful debug info to be gathered, as it is now quite butchered by all my attempts to fsck and mount it. I will just reformat ...
Back online Could any of you jump on IRC? irc.oftc.net #bcache - that will make it easier to walk through different debugging steps
Back online Could any of you jump on IRC? irc.oftc.net #bcache - that will make it easier to walk through different debugging steps
I joined the IRC. Should I reach out to you there? I'm using PrefersAwkward username
Just saw you and missed you
From alloc_debug: open_buckets_wait waiting
that's the line that jumps out at me; there's 1024 open buckets and they're not meant to be held for longer than it takes to do an index update after a write, so that indicates some sort of leak or a deadlock.
Can someone who's seeing that check sysfs internal/open_buckets?
I can no longer even attempt to mount the fs unless i do -o very_degraded
and try with only one device, because of the "split brain" errors. I think my attempt at running kernel fsck butchered the FS and now it refuses to use multiple devices.
Would it still be useful to try to debug it in that state?
I have not reformatted yet, I still have the broken FS. I could try to get on IRC tonight (for me in EU, by your posting times above I guess you are in a NA timezone) to help debug if it would be of any use. Otherwise, I will reformat tonight and reinstall my OS.
Were you hitting the null ptr deref? I want to get that debugged, but I may have enough to do that.
I wonder if there's anything we can do to recover from split brain scenarios, that's a nasty one.
Hi Kent,
I wonder if there's anything we can do to recover from split brain scenarios, that's a nasty one.
Check https://github.com/koverstreet/bcachefs/issues/648#issuecomment-1962395561 I seem to face a similar problem.
If I do a fsck it always says the other drives have a different seq, expecting 65 but getting higher values from the other drives.
Ok, I'm trying to figure out how to reproduce this; I've tested kernel 6.7.9 and tools 1.6.4, going back and forth, and nothing. Anyone have the steps to repro?
Don't know if this is a reliable reproducer, but this is what I think many of us have in common:
Create multi disk array
Put data on it (at least for me we're talking about 20-30 TiB)
Change something in the config (add foreground/background target and/or disk)
Put more data on it (at least for me we're talking about 20-30 TiB)
Fail to mount fs. For me it happened randomly once and the other time I pulled the plug (accidentally)
Fsck fails but upgrades disks to 1.6
WARNING at libbcachefs/btree_iter.c:2827: btree trans held srcu lock (delaying memory reclaim) for 51 seconds
Split brain
Hope this helps :)
Figured it out: it's from writing a superblock without downgrading it.
I'll figure out the fix to backport.
Backports are now being tested, will be sending them off to Greg for 6.7 shortly
I think I might be hitting this on 6.8.2 with bcachefs-tools version 1.6.4? (At least - when trying to run a userspace fsck after an unclean shutdown; I get a stream of constant journal_reclaim_would_deadlock
messages and see bcachefs
consuming 90% of the memory on a 128GB system...)
It's possible I just haven't waiting long enough for things to make progress and I'll report back if that's wrong.
I'm going to try again now with a kernel fsck so I can at least report back with alloc_debug
output if I get stuck at this point again. (And out of hope that my problem is resolved by the same steps @Wingar took above.)
@koverstreet - what's the fix you referenced in the thread above; so I can make sure I have it applied to my kernel / bcachefs-tools?
Tried again on 6.9-rc1; and that seems to work somewhat better? (No errors or panics during kernel fsck at any rate?)
However, the kernel fsck seems to hang indefinitely after some period of time with no CPU or disk usage; and the filesystem didn't end up being mounted.
(Apologies if this is unrelated; just seems to match everyone else's reports here. Figured information from running bcachefs mount -o fsck,fix_errors
on 6.9-rc1 might be useful.)
I rebuilt the kernel with a bunch more debugging enabled (the bcachefs-specific stuff; and lockdeps). Here's the new kernel output from trying to mount:
I'm also attaching the contents of /proc/lock_stat
because lockdep was disabled due to too deep of a chain of locks.
I just pushed a fix for this to the bcachefs-testing branch - a new watermark for btree interior updates. Can someone test it and see if it fixes the issue?
Hey I built my kernel from bcahefs-testing
, but fsck.bcachefs
still does not seem to work:
[root@kexec:~]# fsck.bcachefs /dev/nvme1n1p1:/dev/nvme0n1p2
mounting version 1.6: (unknown version) opts=ro,metadata_replicas=2,data_replicas=2,degraded,fsck,fix_errors=ask,read_only
recovering from unclean shutdown
superblock requires following recovery passes to be run:
check_subvols,check_dirents
Version downgrade required:
journal read done, replaying entries 28078178-28080680
alloc_read... done
stripes_read... done
snapshots_read... done
check_allocations... done
going read-write
journal_replay...bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start(): error journal_reclaim_would_deadlock
bch2_btree_update_start: 5593 callbacks suppressed
bch2_btree_update_start(): error journal_reclaim_would_deadlock
[...]
(these messages continue, without disk or CPU activity, but nothing about blocked tasks in dmesg)
Edit: trying to bcachefs mount -o fsck,fix_errors /dev/nvme1n1p1:/dev/nvme0n1p2 /mnt/
, I am getting something in dmesg:
[ 1493.824285] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): mounting version 1.4: member_seq opts=metadata_replicas=2,data_replicas=2,fsck,fix_errors=yes
[ 1493.824292] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): recovering from unclean shutdown
[ 1493.824296] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): superblock requires following recovery passes to be run:
check_subvols,check_dirents
[ 1493.824301] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): Doing compatible version upgrade from 1.4: member_seq to 1.6: btree_subvolume_children
[ 1494.613438] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): journal read done, replaying entries 28078178-28081265
[ 1496.012522] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): alloc_read... done
[ 1496.027695] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): stripes_read... done
[ 1496.027701] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): snapshots_read... done
[ 1496.027714] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): check_allocations...
[ 1503.528973] Process accounting resumed
[ 1578.779521] done
[ 1578.810930] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): going read-write
[ 1578.818998] bcachefs (f60df395-9d99-4067-a59b-0ac1f3cc38a9): journal_replay...
[ 1681.831269] Process accounting resumed
[ 1722.063047] INFO: task kworker/u50:5:252 blocked for more than 122 seconds.
[ 1722.063335] Not tainted 6.9.0-rc2 #1-NixOS
[ 1722.063528] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1722.063847] task:kworker/u50:5 state:D stack:0 pid:252 tgid:252 ppid:2 flags:0x00004000
[ 1722.063855] Workqueue: btree_update btree_interior_update_work [bcachefs]
[ 1722.063949] Call Trace:
[ 1722.063952] <TASK>
[ 1722.063957] __schedule+0x3ec/0x1540
[ 1722.063973] schedule+0x27/0xf0
[ 1722.063978] __closure_sync+0x7e/0x150
[ 1722.063985] bch2_btree_update_start+0x7dc/0x7f0 [bcachefs]
[ 1722.064061] ? __pfx_closure_sync_fn+0x10/0x10
[ 1722.064068] ? srso_return_thunk+0x5/0x5f
[ 1722.064074] __bch2_foreground_maybe_merge+0x568/0xd40 [bcachefs]
[ 1722.064151] ? journal_validate_key+0x2b7/0x610 [bcachefs]
[ 1722.064251] __bch2_trans_commit+0x3ab/0x1830 [bcachefs]
[ 1722.064335] btree_interior_update_work+0x764/0x9e0 [bcachefs]
[ 1722.064413] ? process_one_work+0x18e/0x3b0
[ 1722.064420] process_one_work+0x18e/0x3b0
[ 1722.064427] worker_thread+0x245/0x350
[ 1722.064433] ? __pfx_worker_thread+0x10/0x10
[ 1722.064438] kthread+0xd0/0x100
[ 1722.064444] ? __pfx_kthread+0x10/0x10
[ 1722.064449] ret_from_fork+0x34/0x50
[ 1722.064454] ? __pfx_kthread+0x10/0x10
[ 1722.064459] ret_from_fork_asm+0x1a/0x30
[ 1722.064471] </TASK>
[ 1722.064479] INFO: task bcachefs:1638 blocked for more than 122 seconds.
[ 1722.064750] Not tainted 6.9.0-rc2 #1-NixOS
[ 1722.064951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1722.065269] task:bcachefs state:D stack:0 pid:1638 tgid:1638 ppid:1204 flags:0x00004006
[ 1722.065275] Call Trace:
[ 1722.065277] <TASK>
[ 1722.065281] __schedule+0x3ec/0x1540
[ 1722.065293] schedule+0x27/0xf0
[ 1722.065298] __closure_sync+0x7e/0x150
[ 1722.065305] bch2_btree_update_start+0x7dc/0x7f0 [bcachefs]
[ 1722.065375] ? srso_return_thunk+0x5/0x5f
[ 1722.065382] ? __pfx_closure_sync_fn+0x10/0x10
[ 1722.065390] bch2_btree_split_leaf+0x64/0x2b0 [bcachefs]
[ 1722.065460] ? srso_return_thunk+0x5/0x5f
[ 1722.065467] ? bch2_journal_replay+0x172/0x650 [bcachefs]
[ 1722.065554] bch2_trans_commit_error+0x21c/0x4f0 [bcachefs]
[ 1722.065629] ? srso_return_thunk+0x5/0x5f
[ 1722.065633] ? six_trylock_ip+0x1f/0x50 [bcachefs]
[ 1722.065713] ? srso_return_thunk+0x5/0x5f
[ 1722.065719] __bch2_trans_commit+0xd87/0x1830 [bcachefs]
[ 1722.065794] ? bch2_trans_iter_exit+0x71/0x90 [bcachefs]
[ 1722.065871] bch2_journal_replay+0x172/0x650 [bcachefs]
[ 1722.065959] ? __bch2_print+0x87/0xe0 [bcachefs]
[ 1722.066040] bch2_run_recovery_pass+0x38/0xa0 [bcachefs]
[ 1722.066120] bch2_run_recovery_passes+0xb6/0x180 [bcachefs]
[ 1722.066198] bch2_fs_recovery+0x6d9/0x1390 [bcachefs]
[ 1722.066276] ? __bch2_print+0x87/0xe0 [bcachefs]
[ 1722.066355] ? bch2_printbuf_exit+0x20/0x30 [bcachefs]
[ 1722.066433] ? srso_return_thunk+0x5/0x5f
[ 1722.066437] ? print_mount_opts+0x131/0x180 [bcachefs]
[ 1722.066517] bch2_fs_start+0x2f5/0x470 [bcachefs]
[ 1722.066592] bch2_fs_open+0x6c4/0x6e0 [bcachefs]
[ 1722.066672] bch2_mount+0x5bd/0x790 [bcachefs]
[ 1722.066762] ? srso_return_thunk+0x5/0x5f
[ 1722.066778] legacy_get_tree+0x2b/0x50
[ 1722.066784] vfs_get_tree+0x29/0xe0
[ 1722.066790] ? srso_return_thunk+0x5/0x5f
[ 1722.066794] path_mount+0x4ca/0xb10
[ 1722.066803] __x64_sys_mount+0x11a/0x150
[ 1722.066810] do_syscall_64+0xbd/0x210
[ 1722.066817] entry_SYSCALL_64_after_hwframe+0x72/0x7a
[ 1722.066822] RIP: 0033:0x7fc030d8a0fe
[ 1722.066839] RSP: 002b:00007ffcbf98a588 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1722.066844] RAX: ffffffffffffffda RBX: 000000002b7386f0 RCX: 00007fc030d8a0fe
[ 1722.066847] RDX: 000000002b738c50 RSI: 000000002b7362b0 RDI: 000000002b7386f0
[ 1722.066850] RBP: 0000000000000005 R08: 000000002b736030 R09: 0000000000000000
[ 1722.066853] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcbf98a5f0
[ 1722.066856] R13: 000000000000001e R14: 0000000000000004 R15: 00000000008b6558
[ 1722.066866] </TASK>
Check sysfs dev-0/alloc_debug, internal/btree_updates and post them here
Sorry, I was in a bit of a rush so I reformatted and restored backups. The rebuilt environment is exactly the same though, so there's a chance that the problem will reappear if we're lucky.
I seem to be seeing this as well. I haven't been able to build a new kernel with the testing branch though, so this is with kernel 6.8.2. If I have more free time I'll attempt building it. Here are the logs I do have in case they help:
dmesg
[ 15.359897] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): mounting version 1.4: member_seq opts=metadata_replicas=3,data_replicas=3,metadata_replicas_required=3,background_compression=zstd,foreground_target=hdd,background_target=hdd,promote_target=ssd,erasure_code,verbose
[ 15.359906] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): recovering from unclean shutdown
[ 15.359910] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): starting journal read
[ 23.117277] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal read done on device sdc, ret 0
[ 23.325003] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal read done on device sdf, ret 0
[ 23.408892] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal read done on device sda, ret 0
[ 23.428550] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal read done on device sdd, ret 0
[ 23.429761] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal read done on device sdg, ret 0
[ 23.855037] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal read done on device sde, ret 0
[ 26.429228] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal read done on device sdb, ret 0
[ 26.429259] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal read done, replaying entries 4494202-4495743
[ 31.438759] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): Journal keys: 2700653 read, 1411002 after sorting and compacting
[ 31.864633] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): alloc_read... done
[ 33.471531] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): stripes_read... done
[ 33.471538] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): snapshots_read... done
[ 34.304783] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): going read-write
[ 34.312443] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): journal_replay...
[ 41.048106] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.048446] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.048777] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.049105] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.049434] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.049769] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.050099] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.050420] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.050741] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 41.051065] bcachefs (64eecf01-40f7-4220-b811-8bd6f0521f12): bch2_btree_update_start(): error journal_reclaim_would_deadlock
[ 43.961808] kernel BUG at fs/bcachefs/btree_trans_commit.c:935!
[ 43.961825] CPU: 8 PID: 414 Comm: mount.bcachefs Not tainted 6.8.2-arch2-1 #1 a430fb92f7ba43092b62bbe6bac995458d3d442d
[ 43.961837] RIP: 0010:bch2_trans_commit_error+0x381/0x500 [bcachefs]
[ 43.961946] ? bch2_trans_commit_error+0x381/0x500 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.961978] ? bch2_trans_commit_error+0x381/0x500 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962011] ? bch2_trans_commit_error+0x381/0x500 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962042] ? bch2_journal_replay+0x42c/0x550 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962078] ? bch2_journal_replay+0x42c/0x550 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962106] ? bch2_trans_commit_error+0x381/0x500 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962135] ? bch2_trans_commit_error+0x1ab/0x500 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962163] __bch2_trans_commit+0xbdf/0x1730 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962190] ? bch2_trans_iter_exit+0x71/0x90 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962221] bch2_journal_replay+0x42c/0x550 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962251] bch2_run_recovery_pass+0x38/0xa0 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962279] bch2_fs_recovery+0x175a/0x1aa0 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962312] ? bch2_printbuf_exit+0x20/0x30 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962345] ? print_mount_opts+0x2b5/0x3e0 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962378] bch2_fs_start+0x2cd/0x410 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962407] bch2_fs_open+0x10fc/0x1640 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962449] ? bch2_mount+0x57f/0x730 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962482] bch2_mount+0x57f/0x730 [bcachefs 46c358f386912ecd5a5656ed8d373795ac1e5800]
[ 43.962696] Modules linked in: ccm algif_aead crypto_null des3_ede_x86_64 cbc des_generic libdes algif_skcipher cmac md4 algif_hash af_alg ip6t_REJECT nf_reject_ipv6 intel_rapl_msr intel_rapl_common xt_hl ip6t_rt snd_hda_codec_realtek kvm_amd snd_hda_codec_generic mt7921e kvm mt7921_common snd_hda_codec_hdmi mt792x_lib irqbypass crct10dif_pclmul mt76_connac_lib snd_hda_intel ipt_REJECT crc32_pclmul polyval_clmulni snd_intel_dspcfg nf_reject_ipv4 mt76 polyval_generic snd_intel_sdw_acpi gf128mul snd_hda_codec btusb ghash_clmulni_intel snd_hda_core ext4 mac80211 sha512_ssse3 btrtl sha1_ssse3 snd_hwdep btintel aesni_intel snd_pcm btbcm r8169 crypto_simd mbcache btmtk vfat xt_LOG snd_timer libarc4 fat nf_log_syslog realtek jbd2 cryptd bluetooth sp5100_tco xt_comment snd mdio_devres rapl wmi_bmof pcspkr cfg80211 soundcore ccp xt_multiport libphy ecdh_generic i2c_piix4 joydev k10temp crc16 gpio_amdpt xt_recent bcachefs mac_hid gpio_generic rfkill nft_limit lz4hc_compress lz4_compress xt_limit xt_addrtype xt_tcpudp
[ 43.962848] RIP: 0010:bch2_trans_commit_error+0x381/0x500 [bcachefs]
[ 176.851509] bcachefs (/dev/sda): error reading superblock: (null)
dev-0/alloc_debug:
buckets sectors fragmented
free 39838753 0 0
sb 13 6152 504
journal 8192 4194304 0
btree 118357 60598784 0
user 6705591 3428202231 5063294
cached 6736834 1859999418 0
parity 0 0 0
stripe 0 0 0
need_gc_gens 0 0 0
need_discard 4 0 0
reserves:
stripe 1669048
normal 834552
copygc 56
btree 28
btree_copygc 0
reclaim 0
freelist_wait empty
open buckets allocated 988
open buckets this dev 158
open buckets total 1024
open_buckets_wait empty
open_buckets_btree 987
open_buckets_user 0
buckets_to_invalidate 0
btree reserve cache 0
internal/btree_updates:
00000000194a073f m 1 w 1 r 1 j 0
0000000023eeef38 m 1 w 1 r 1 j 0
0000000033a603f2 m 1 w 1 r 1 j 4494984
00000000b710ddb1 m 1 w 1 r 1 j 4495752
00000000ef329170 m 1 w 1 r 1 j 4494735
000000007c935588 m 1 w 1 r 1 j 0
000000008efd8759 m 1 w 1 r 1 j 0
00000000bf65462a m 1 w 1 r 1 j 4494202
00000000326b5ee7 m 1 w 1 r 1 j 0
000000001ac70bde m 1 w 1 r 1 j 4494453
000000000f51acf2 m 1 w 1 r 1 j 0
000000003bf6e6e4 m 1 w 1 r 1 j 4494202
00000000630f5c75 m 1 w 1 r 1 j 0
00000000e10784c3 m 1 w 1 r 1 j 0
000000005b78a96b m 1 w 1 r 1 j 0
00000000c74084e9 m 1 w 1 r 1 j 0
000000009eda92aa m 1 w 1 r 1 j 0
00000000a876575d m 1 w 1 r 1 j 0
000000006df89aef m 1 w 1 r 1 j 0
00000000962f539c m 1 w 0 r 1 j 0
000000004f7664e4 m 1 w 0 r 1 j 0
00000000b55e3dd3 m 1 w 0 r 1 j 4495756
00000000ef9a2888 m 1 w 0 r 1 j 4495755
000000003982e633 m 1 w 0 r 1 j 4495757
000000009f850f3c m 1 w 0 r 1 j 4495757
00000000f5efcf71 m 1 w 0 r 1 j 0
00000000286be7aa m 1 w 0 r 1 j 0
000000006f9255cd m 1 w 0 r 1 j 0
000000000c8b2b47 m 1 w 0 r 1 j 0
000000005ab4ec85 m 1 w 0 r 1 j 4495759
00000000380c3c1a m 1 w 0 r 1 j 4495759
000000002e4a250a m 1 w 0 r 1 j 0
00000000035e98ae m 1 w 0 r 1 j 0
00000000b7697e68 m 1 w 0 r 1 j 0
0000000021894e39 m 1 w 0 r 1 j 0
000000002977a250 m 1 w 0 r 1 j 4495759
0000000081d6e0b5 m 1 w 0 r 1 j 4495759
00000000fad38885 m 1 w 0 r 1 j 0
000000009f58380f m 1 w 0 r 1 j 0
000000005892f9f9 m 1 w 0 r 1 j 4495759
00000000a0559217 m 1 w 0 r 1 j 4495759
00000000997f1173 m 1 w 0 r 1 j 0
0000000003d25906 m 1 w 0 r 1 j 0
00000000aeab97cd m 1 w 0 r 1 j 4495759
0000000008a8e6a7 m 1 w 0 r 1 j 0
00000000f496193c m 1 w 0 r 1 j 0
000000004326ae0a m 1 w 0 r 1 j 0
00000000e2bffef0 m 1 w 0 r 1 j 0
000000001ae00149 m 1 w 0 r 1 j 4495759
00000000371ef815 m 1 w 0 r 1 j 0
000000001e808411 m 1 w 0 r 1 j 0
00000000fcd39749 m 1 w 0 r 1 j 4495759
00000000a15fdcd9 m 1 w 0 r 1 j 4495759
00000000d70e2a78 m 1 w 0 r 1 j 0
00000000930a5238 m 1 w 0 r 1 j 0
00000000869d6139 m 1 w 0 r 1 j 0
00000000bf318bd2 m 1 w 0 r 1 j 4495760
00000000b46a8731 m 1 w 0 r 1 j 0
00000000edccba69 m 1 w 0 r 1 j 4495760
0000000004f045a4 m 1 w 0 r 1 j 4495760
00000000591c700e m 1 w 0 r 1 j 0
00000000bc0020d4 m 1 w 0 r 1 j 4495760
000000000656e003 m 1 w 0 r 1 j 0
000000009eedfdf3 m 1 w 0 r 1 j 4495761
000000008d5a3887 m 1 w 0 r 1 j 4495761
000000001ebd3619 m 1 w 0 r 1 j 0
00000000cc24fdae m 1 w 0 r 1 j 4495761
0000000086f094a2 m 1 w 0 r 1 j 4495761
000000006cc78af8 m 1 w 0 r 1 j 4495761
00000000abbc371c m 1 w 0 r 1 j 4495761
0000000006f6cb05 m 1 w 0 r 1 j 4495761
0000000038235bcb m 1 w 0 r 1 j 4495761
00000000e8185be9 m 1 w 0 r 1 j 4495761
00000000a38ef1c1 m 1 w 0 r 1 j 0
00000000e820b447 m 1 w 0 r 1 j 0
000000003591af03 m 1 w 0 r 1 j 0
000000001f0ae9ae m 1 w 0 r 1 j 0
0000000087166620 m 1 w 0 r 1 j 0
00000000a17dbfc0 m 1 w 0 r 1 j 4495761
000000007376ffe6 m 1 w 0 r 1 j 4495761
00000000fd661e33 m 1 w 0 r 1 j 4495761
00000000bccca2ab m 1 w 0 r 1 j 4495761
0000000004801f49 m 1 w 0 r 1 j 0
00000000b2bdf2f1 m 1 w 0 r 1 j 0
000000007114c49f m 1 w 0 r 1 j 0
000000007faee1a2 m 1 w 0 r 1 j 0
000000009e1bb653 m 1 w 0 r 1 j 0
00000000621055d6 m 1 w 0 r 1 j 0
0000000051e733d3 m 1 w 0 r 1 j 0
000000003883df70 m 1 w 0 r 1 j 0
00000000d4229549 m 1 w 0 r 1 j 0
00000000b00ab471 m 1 w 0 r 1 j 0
0000000079531cdf m 1 w 0 r 1 j 4495761
0000000050fb659e m 1 w 0 r 1 j 4495761
000000007878143f m 1 w 0 r 1 j 4495761
00000000fc539c34 m 1 w 0 r 1 j 4495761
00000000de5162ae m 1 w 0 r 1 j 4495761
00000000e74472c1 m 1 w 0 r 1 j 4495761
000000000d7c5997 m 1 w 0 r 1 j 0
00000000e4d25975 m 1 w 0 r 1 j 0
0000000044e085b6 m 1 w 0 r 1 j 0
0000000006a0f5df m 1 w 0 r 1 j 0
0000000030ea4406 m 1 w 0 r 1 j 0
0000000012568f18 m 1 w 0 r 1 j 4495761
000000001d77544e m 1 w 0 r 1 j 4495762
00000000b12eb8af m 1 w 0 r 1 j 0
0000000030fcc564 m 1 w 0 r 1 j 0
00000000497e1351 m 1 w 0 r 1 j 0
00000000e0200d7f m 1 w 0 r 1 j 0
000000008b03ebf2 m 1 w 0 r 1 j 0
000000008a602d85 m 1 w 0 r 1 j 0
0000000069b1915d m 1 w 0 r 1 j 0
0000000067db49c0 m 1 w 0 r 1 j 0
00000000f9bba9b3 m 1 w 0 r 1 j 0
0000000036c1d5ec m 1 w 0 r 1 j 0
00000000fe1dbf07 m 1 w 0 r 1 j%
Is anyone still able to hit this with my master branch?
Kent, I just now got your master branch kernel running, and I have tools 1.6.4. I also have a split-brained FS handy for debugging.
IIRC, the initial issues started after my computer lost power back in 6.7. After trying to fix a while back, I began to see splitbrains. Not sure if I did user errors or made things worse, or if I'm experiencing what you're aiming to fix, but I'll give it a try.
Hopefully, my stuff is useful. If not, I'll try and reformat a new FS and see if I can reproduce.
I ran this:
sudo fsck.bcachefs /dev/sda:/dev/sdb:/dev/nvme1n1:/dev/nvme2n1
Got this message:
bch2_dev_in_fs() Split brain detected between /dev/nvme2n1 and /dev/sda: /dev/sda believes seq of /dev/nvme2n1 to be 36, but /dev/nvme2n1 has 39 Not using /dev/nvme2n1 bch2_dev_in_fs() Split brain detected between /dev/nvme1n1 and /dev/sda: /dev/sda believes seq of /dev/nvme1n1 to be 36, but /dev/nvme1n1 has 39 Not using /dev/nvme1n1 bch2_dev_in_fs() Split brain detected between /dev/sdb and /dev/sda: /dev/sda believes seq of /dev/sdb to be 36, but /dev/sdb has 39 Not using /dev/sdb insufficient devices online (0) for replicas entry btree: 1/2 [0 1] bch2_fs_open() bch_fs_open err opening /dev/sda: insufficient_devices_to_start
You need -o no_splitbrain_check - there was a bug that caused us to not downgrade correctly, so the sequence number for splitbrain detection didn't get updated; that option is the workaround.
I'm probably screwing this up.
I ran:
sudo fsck.bcachefs -o no_splitbrain_check /dev/sda:/dev/sdb:/dev/nvme1n1:/dev/nvme2n1
Got this message:
bch2_dev_in_fs() Split brain detected between /dev/nvme2n1 and /dev/sda: /dev/sda believes seq of /dev/nvme2n1 to be 36, but /dev/nvme2n1 has 39 Not using /dev/nvme2n1 bch2_dev_in_fs() Split brain detected between /dev/nvme1n1 and /dev/sda: /dev/sda believes seq of /dev/nvme1n1 to be 36, but /dev/nvme1n1 has 39 Not using /dev/nvme1n1 bch2_dev_in_fs() Split brain detected between /dev/sdb and /dev/sda: /dev/sda believes seq of /dev/sdb to be 36, but /dev/sdb has 39 Not using /dev/sdb insufficient devices online (0) for replicas entry btree: 1/2 [0 1] bch2_fs_open() bch_fs_open err opening /dev/sda: insufficient_devices_to_start
What version are you using?
I'm running master branch. It says 6.9rc2. Bcachefs tools 1.6.4.
Let me know if you want me to run a print or debug command to confirm
Looks like tools 1.6.4 doesn't have no_splitbrain_check - my bad, I need to tag a new release.
Try mounting with -o no_splitbrain_check,fsck,fix_errors
I ran this:
sudo mount -t bcachefs -o fsck,no_splitbrain_check,fix_errors /dev/sda:/dev/sdb:/dev/nvme1n1:/dev/nvme2n1 /mnt/BigBoi
It spiked CPUs for a minute or two. It then calmed down and hasn't yet mounted. It like the CPU and disks are not doing much according to IOStat. I'll check on it now and then and see if it got mounted.
I also ran this:
dmesg
:
[ 7486.730707] bcachefs (0d21127d-8c69-4c51-b44a-6ef94b8a1216): alloc_read... done [ 7486.741119] bcachefs (0d21127d-8c69-4c51-b44a-6ef94b8a1216): stripes_read... done [ 7486.741124] bcachefs (0d21127d-8c69-4c51-b44a-6ef94b8a1216): snapshots_read... done [ 7486.741131] bcachefs (0d21127d-8c69-4c51-b44a-6ef94b8a1216): check_allocations... [ 7500.813090] iwlwifi 0000:0e:00.0: WRT: Invalid buffer destination [ 7500.965455] iwlwifi 0000:0e:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x20 [ 7500.965469] iwlwifi 0000:0e:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f [ 7500.965485] iwlwifi 0000:0e:00.0: WFPM_AUTH_KEY_0: 0x90 [ 7500.965497] iwlwifi 0000:0e:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0 [ 7549.366308] bucket 0:13751 gen 4 has wrong data_type: got btree, should be need_discard, fixing [ 7549.366313] bucket 0:13751 gen 4 data type need_discard has wrong dirty_sectors: got 1024, should be 0, fixing [ 7550.246114] bucket 1:30811 gen 2 has wrong data_type: got btree, should be need_discard, fixing [ 7550.246119] bucket 1:30811 gen 2 data type need_discard has wrong dirty_sectors: got 1024, should be 0, fixing [ 7562.470944] dev 0 has wrong btree buckets: got 91890, should be 91889, fixing [ 7562.470948] dev 0 has wrong btree sectors: got 53235200, should be 53234176, fixing [ 7562.470950] dev 0 has wrong need_discard buckets: got 10, should be 11, fixing [ 7562.470953] dev 1 has wrong btree buckets: got 91890, should be 91889, fixing [ 7562.470954] dev 1 has wrong btree sectors: got 53235200, should be 53234176, fixing [ 7562.470955] dev 1 has wrong need_discard buckets: got 11, should be 12, fixing [ 7562.470961] fs has wrong btree: got 106470400, should be 106468352, fixing [ 7562.470964] fs has wrong btree: 1/2 [0 1]: got 106470400, should be 106468352, fixing [ 7562.497474] done [ 7562.544354] bcachefs (0d21127d-8c69-4c51-b44a-6ef94b8a1216): going read-write
echo w > /proc/sysrq-trigger; dmesg top perf top perf top -e bcachefs:*
I think I understand some of these things. I ran them all, but I don't know how to read / share of some of them. I'm not familiar with "perf" or "echo w > /proc/sysrq-trigger" for example.
Here are some things I saw when I ran that:
When I ran this:
perf top
5.55% [kernel] [k] perf_adjust_freq_unthr_context β
2.75% [kernel] [k] srso_alias_return_thunk β
2.20% libc.so.6 [.] memmove_avx512_unaligned_erms β
1.86% [kernel] [k] srso_alias_safe_ret β
1.31% [kernel] [k] perf_pmu_nop_void β
1.03% [kernel] [k] read_tsc β
0.93% [amdgpu] [k] amdgpu_cgs_read_register β
0.82% [kernel] [k] io_idle β
0.80% perf [.] rb_next β
0.74% [kernel] [k] asm_sysvec_apic_timer_interrupt β
0.61% [kernel] [k] menu_select β
0.58% libQt6Core.so.6.6.2 [.] QtPrivate::compareStrings(QStringView, QStringView, Qt::CaseSensitivity) β
0.52% [kernel] [k] perf_poll β
0.51% [amdgpu] [k] gfx_v11_0_get_gpu_clock_counter β
0.51% [kernel] [k] native_write_msr β
0.51% [kernel] [k] _copy_to_user β
0.49% [kernel] [k] native_sched_clock β
0.45% [amdgpu] [k] amdgpu_device_rreg β
0.44% libQuickCharts.so.6.0.0 [.] MapProxySource::item(int) const β
0.41% [kernel] [k] native_read_msr β
0.39% [kernel] [k] ktime_get β
0.39% perf [.] evsel__parse_sample β
0.35% [kernel] [k] do_idle β
0.35% perf [.] perf_hppis_dynamic_entry β
0.31% [kernel] [k] cpuidle_enter_state β
0.31% [kernel] [k] ct_idle_exit β
0.31% libQt6Core.so.6.6.2 [.] QVariant::~QVariant() β
0.30% [kernel] [k] update_load_avg β
0.29% [kernel] [k] perf_mux_hrtimer_handler β
0.29% [kernel] [k] hrtimer_run_queues β
0.28% [kernel] [k] update_blocked_averages β
0.28% [kernel] [k] merge_sched_in β
0.27% libc.so.6 [.] _int_malloc β
0.25% [kernel] [k] get_next_timer_interrupt β
0.25% libKSysGuardSensors.so.6.0.3 [.] KSysGuard::SensorDataModel::data(QModelIndex const&, int) const β
0.25% [kernel] [k] perf_event_update_userpage β
0.23% [kernel] [k] arch_perf_update_userpage β
0.23% [kernel] [k] update_sd_lb_stats.constprop.0 β
0.23% [kernel] [k] psi_group_change β
0.22% [kernel] [k] scheduler_tick β
0.22% [kernel] [k] do_sys_poll β
0.22% [kernel] [k] update_rq_clock β
0.21% perf [.] dso__find_symbol β
0.21% [kernel] [k] sched_clock_cpu β
0.20% firefox [.] free
When I ran this:
Dmesg
[ 7549.366308] bucket 0:13751 gen 4 has wrong data_type: got btree, should be need_discard, fixing
[ 7549.366313] bucket 0:13751 gen 4 data type need_discard has wrong dirty_sectors: got 1024, should be 0, fixing
[ 7550.246114] bucket 1:30811 gen 2 has wrong data_type: got btree, should be need_discard, fixing
[ 7550.246119] bucket 1:30811 gen 2 data type need_discard has wrong dirty_sectors: got 1024, should be 0, fixing
[ 7562.470944] dev 0 has wrong btree buckets: got 91890, should be 91889, fixing
[ 7562.470948] dev 0 has wrong btree sectors: got 53235200, should be 53234176, fixing
[ 7562.470950] dev 0 has wrong need_discard buckets: got 10, should be 11, fixing
[ 7562.470953] dev 1 has wrong btree buckets: got 91890, should be 91889, fixing
[ 7562.470954] dev 1 has wrong btree sectors: got 53235200, should be 53234176, fixing
[ 7562.470955] dev 1 has wrong need_discard buckets: got 11, should be 12, fixing
[ 7562.470961] fs has wrong btree: got 106470400, should be 106468352, fixing
[ 7562.470964] fs has wrong btree: 1/2 [0 1]: got 106470400, should be 106468352, fixing
[ 7562.497474] done
[ 7562.544354] bcachefs (0d21127d-8c69-4c51-b44a-6ef94b8a1216): going read-write
[ 7562.552413] bcachefs (0d21127d-8c69-4c51-b44a-6ef94b8a1216): journal_replay...
And when I ran this:
perf top -e bcachefs:*
Available samples 0 bcachefs:btree_path_upgrade_fail β 0 bcachefs:btree_path_relock_fail β 0 bcachefs:btree_node_set_root β 0 bcachefs:btree_node_rewrite β 0 bcachefs:btree_node_split β 0 bcachefs:btree_node_merge β 0 bcachefs:btree_node_compact β 0 bcachefs:btree_reserve_get_fail β 0 bcachefs:btree_node_free β 0 bcachefs:btree_node_alloc β 0 bcachefs:btree_node_write β 0 bcachefs:btree_node_read β 725 bcachefs:btree_cache_cannibalize_unlock β 0 bcachefs:btree_cache_cannibalize β 725 bcachefs:btree_cache_cannibalize_lock β 0 bcachefs:btree_cache_cannibalize_lock_fail β 0 bcachefs:btree_cache_reap β 0 bcachefs:btree_cache_scan β 0 bcachefs:bkey_pack_pos_fail β 725 bcachefs:journal_reclaim_finish β 725 bcachefs:journal_reclaim_start β 0 bcachefs:journal_write β 0 bcachefs:journal_entry_close β 0 bcachefs:journal_entry_full β 0 bcachefs:journal_full β 0 bcachefs:read_reuse_race β 0 bcachefs:read_retry β 0 bcachefs:read_split β 0 bcachefs:read_bounce β 0 bcachefs:read_nopromote β 0 bcachefs:read_promote β 0 bcachefs:write_super β
Okay, after poking a bit, it looks like Dmesg continues to print bcachefs stuff that I think you might find useful. If I understand it correctly, it is still working on something, so I'll let it run overnight and perhaps through tomorrow, and check on it now and then:
Here is a pastebin of dmesg: https://pastebin.com/gvWkATkd
I left it running since yesterday. The command is still in a running state and the filesystem remains unmounted. I left IOStat on to log the I/O operations while I was away. It looks like the disks involved in the FS haven't done anything.
I'll go ahead and reboot to clear things up. Let me know if you'd like for me to try anything other commands or any new branches on this filesystem. I can also rejoin IRC and pick things up there if you'd like.
Similar issues here.
Archlinux whith latest updates 6.8.5-arch1-1
and extra/bcachefs-tools 3:1.6.4-2
. Two ssd with two 128G partitions, ~10 6TB hdds, 128G ddr4, 24 core xeon.
either
btree trans held srcu lock (delaying memory reclaim) for 38 seconds
or
kernel BUG at fs/bcachefs/journal_io.c:1809!
followed by a lot of
bch2_btree_update_start(): error journal_reclaim_would_deadlock
And then a zero activity in iotop
and several processes (bcachefs, bch kernel threads) in D
sate.
One time after couple of hours part of the system became unresponsive (several systemd-journald and other unrelated daemons was in D
). So, each time this error appeared, I was constantly forcefully plugging my server on and off.
At the beginning there was about 6 hdd and 2 ssd. Other hdds was added afterwards.
Initial setting was metadata_target=ssd
and metadata_replicas=3
(turns out this is a maximum number, even 4 is too big). Then I started copying data in, everything was fine. Shortly after I saw that there is not enough space in two 64G partitions for metadata. I've resized them online to 128G. Everything worked fine. After adding a couple of hdds an error mentioned above appeared. I was able to fully recover by running mount with -o fsck,fix_errors
. This error appeared again then I was trying to evacuate one of the hdd. Again fsck,fix_errors
seemed like fixed everything. I've succesfully removed one hdd.
After another batch of hdd was added and data copied in I noticed that both ssd had ~55G of user data
on them. This was undesirable. data rebalance
was silently doing nothing. I've changed foreground_target from none
to hdd
. data rebalance
was silently doing nothing still.
I've decided to evacuate one of the ssds (the plan was to add and rebalance it again later). During this operation similar errors appeared. And everything broke down completely.
Neither mounting with fsck,fix_errors
nor fsck.bcachefs
works. First one hangs with bch2_btree_update_start(): error journal_reclaim_would_deadlock
, second tries to fix something and aborts with fsck.bcachefs: libbcachefs/journal_io.c:1849: bch2_journal_write_prep: Assertion '!(u64s > j->entry_u64s_reserved)' failed.
(also fsck.bcachefs
works about 3-4 times faster)
I've tried to mount and fsck.bcachefs (-o
for fsck is not documented in -h
) with one or none ssds (it's 3 metadata replicas, right?), but got no luck.
I have a core of fsck.bcachefs
but it's 11G and "truncated" (whatever coredumpctl means by it)
fsck.bcachefs -R
produces more manageable core around 70M.
What options do I have left to try to recover my data?
Try my master branch - we just got reedriley's filesystem working again last night
If it doesn't mount, jump on IRC and start sending me logs and sysfs/debugfs debug output
I have success mounting for an unclear shutdown filesystem and looks working well to me.
I built the latest master branch(commit hash is d82346f
)
kernel: 6.9.0-rc2+ bcachefs-tool: 1.6.4
thanks for your work
I have an Arch Linux system that is installed to bcachefs. It is booting using EFISTUB with the kernel on a small EFI partition, everything else is bcachefs. It is a "home media server" running some Docker containers, Kodi, a few other things...
Just to give some relevant info about things I have done in the configuration:
It has two SSDs (/dev/sda1, /dev/sdd1) for foreground+promote and two HDDs (/dev/sdb, /dev/sdc) for background. Replicas=2 at the FS level. Replicas=1 set selectively on some directories. xxhash checksums, no compression.
At one point, I decided that I don't want the rootfs to go on the background devices anymore, so I did
bcachefs setattr --background_target=ssd /
(set the background to ssd on the root of the filesystem), and then set it back on specific directories likebcachefs setattr --background_target=hdd /srv
. I am not sure if this is a good idea, but oh well, it seemed to work.Kernel: 6.7.6-arch1-2
The issue is that I can no longer mount my filesystem, because it hangs during mount during journal replay. I am now trying to mount it from a live ISO to at least collect dmesg output.
Relevant
dmesg
output:bcachefs show-super
output:Running
bcachefs fsck -v /dev/sda1:/dev/sdb:/dev/sdc:/dev/sdd1
:Please let me know what I could do, if anything, to try to safely recover my filesystem?
Or if you'd like anything else from me to help you diagnose and fix this bug?