Open Lykos153 opened 3 months ago
fsck is quite safe on bcachefs
sorry I'm slow getting to these, and I'm about to be offline for a week - but I'll try to get to these soon :)
it does look like you trimmed the important part of the log though - i.e. why the btree node was corrupt
Dammit :/ Apparently dmesg | grep bcachefs
is not a good idea. Well the log is gone now, with the fs read-only and all. But after a reboot it mounted successfully read-write again (without -o fsck,fix_errors
). Not sure if it's interesting, but this is what I got during boot:
Ok, maybe it came back. But it looks different, so maybe it's unrelated... It happened again during mass-copying stuff onto the fs. I hope this time I didn't accidentally crop the log.
EDIT: Someone opened #717 with the same error so I guess this one is separate from the corrupt btree node before write at btree extents level
I'm beginning to suspect a faulty disk. Though I don't know how I would find out which one it is. S.M.A.R.T looks fine (except for the one I labeled ro
as described in #715, but with it being read-only I guess it won't cause issues when writing).
This time I mounted with fsck,fix_errors
. FWIW, here's the log:
EDIT: As fsck
ing after about to insert invalid key in data update path: fatal error
recurred didn't find anything (see https://github.com/koverstreet/bcachefs/issues/717#issuecomment-2246058819 ) I suspect that the errors found here are related to the original corrupt btree node before write at btree extents level
error.
Hey, are you on IRC? You've been turning up a bunch of good bugs, we could work through them quicker if you want to hop on there
There should've been more in that error message, did something get dropped?
:/ I always try to catch the whole message, but apparently something got lost. Let's see if it comes back.
I just joined #bcache as lykos153 so we can discuss issues there
I'm getting the feeling that I'm spamming issues right now. Sorry about that. If there's a more appropriate place to ask about this, I'd be happy to be pointed to it, but as searching the Internet for the string "corrupt btree node before write at btree extents level" yielded zero results, I decided to open another issue.
I just got
while copying lots of stuff onto the file system.
bcachefs fs usage -h
Filesystem: 677cf0a7-1abe-4ce3-876c-2ca63301229d Size: 8.80 TiB Used: 5.98 TiB Online reserved: 1.84 MiB Data type Required/total Durability Devices reserved: 1/0 [] 91.7 MiB btree: 1/3 3 [sde1 nvme0n1p2 sdf1] 30.0 MiB btree: 1/3 3 [sdc1 sde1 sdh1] 1.65 GiB btree: 1/2 2 [sdc1 sdf1] 8.22 GiB btree: 1/3 3 [sdc1 nvme0n1p2 sdf1] 30.0 MiB btree: 1/2 2 [sdc1 nvme0n1p2] 8.42 GiB btree: 1/3 3 [sdc1 sdd1 nvme0n1p2] 2.19 GiB btree: 1/3 3 [sde1 sdd1 nvme0n1p2] 2.14 GiB btree: 1/3 3 [sdd1 nvme0n1p2 sdf1] 48.9 GiB btree: 1/2 2 [sdc1 sde1] 5.07 GiB btree: 1/2 2 [sde1 nvme0n1p2] 7.92 GiB btree: 1/2 2 [sde1 sdf1] 7.75 GiB user: 1/1 1 [sdf1] 3.29 GiB user: 1/1 1 [sdh1] 2.60 TiB user: 1/1 1 [sdc1] 91.4 GiB user: 1/1 1 [nvme0n1p2] 413 GiB user: 1/1 1 [sde1] 85.4 GiB user: 1/1 1 [sdd1] 43.8 GiB user: 1/1 1 [sdb1] 2.64 TiB cached: 1/1 1 [sdd1] 179 GiB cached: 1/1 1 [sde1] 2.76 GiB cached: 1/1 1 [sdc1] 2.81 GiB cached: 1/1 1 [sdh1] 1.68 GiB cached: 1/1 1 [nvme0n1p2] 229 GiB cached: 1/1 1 [sdf1] 196 GiB hdd.hdd1 (device 2): sdh1 ro data buckets fragmented free: 125 GiB 511131 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 562 MiB 2247 user: 2.60 TiB 10913266 cached: 1.68 GiB 11500 parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 2.73 TiB 11446349 hdd.hdd2 (device 5): sdb1 rw data buckets fragmented free: 86.5 GiB 88527 sb: 3.00 MiB 4 1020 KiB journal: 8.00 GiB 8192 btree: 0 B 0 user: 2.64 TiB 2764864 3.11 MiB cached: 0 B 0 parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 2.73 TiB 2861587 hdd.hdd3 (device 3): sdd1 rw data buckets fragmented free: 2.48 TiB 2604193 sb: 3.00 MiB 4 1020 KiB journal: 8.00 GiB 8192 btree: 17.8 GiB 18193 11.5 MiB user: 43.8 GiB 44913 57.3 MiB cached: 179 GiB 186081 parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 11.0 MiB 11 capacity: 2.73 TiB 2861587 ssd.ssd1 (device 0): sdc1 rw data buckets fragmented free: 12.0 GiB 49165 sb: 3.00 MiB 13 252 KiB journal: 954 MiB 3815 btree: 12.1 GiB 49740 user: 91.4 GiB 374544 76.0 KiB cached: 2.63 GiB 11136 parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 119 GiB 488413 ssd.ssd2 (device 1): sde1 rw data buckets fragmented free: 11.2 GiB 46041 sb: 3.00 MiB 13 252 KiB journal: 894 MiB 3577 btree: 11.6 GiB 47684 user: 85.4 GiB 349705 176 KiB cached: 2.59 GiB 10869 parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 112 GiB 457889 ssd.ssd4 (device 4): nvme0n1p2 rw data buckets fragmented free: 279 GiB 572371 sb: 3.00 MiB 7 508 KiB journal: 3.91 GiB 8000 btree: 25.9 GiB 53152 9.25 MiB user: 413 GiB 846468 5.83 MiB cached: 229 GiB 472406 parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 9.00 MiB 18 capacity: 953 GiB 1952422 ssd.ssd5 (device 6): sdf1 rw data buckets fragmented free: 9.61 GiB 19688 sb: 3.00 MiB 7 508 KiB journal: 1.82 GiB 3726 btree: 24.3 GiB 49815 8.75 MiB user: 3.29 GiB 6741 2.29 MiB cached: 193 GiB 396952 parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 9.50 MiB 19 capacity: 233 GiB 476948I also ran
bcachefs fsck -ny
``` bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_alloc_info... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_lrus...incorrect lru entry: lru read time 9034497496 u64s 5 type set 844433964629464:844424930554135:0 len 0 ver 0 for u64s 5 type deleted 3:422167:0 len 0 ver 0, not fixing incorrect lru entry: lru read time 10267875312 u64s 5 type set 844435198007280:844424930553922:0 len 0 ver 0 for u64s 5 type deleted 3:421954:0 len 0 ver 0, not fixing done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_btree_backpointers... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_backpointers_to_extents... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_extents_to_backpointers...missing backpointer for btree=extents l=1 u64s 13 type btree_ptr_v2 5700043:665048:4294967294 len 0 ver 0: seq 39c8f0667d77f7c written 344 min_key 5700043:275480:U32_MAX durability: 3 ptr: 3:257318:512 gen 0 ptr: 4:1382542:512 gen 0 ptr: 6:241449:0 gen 1 got: u64s 5 type deleted 4:1449700884480:0 len 0 ver 0 want: u64s 9 type backpointer 4:1449700884480:0 len 0 ver 0: bucket=4:1382542:0 btree=extents l=1 offset=512:0 len=512 pos=5700043:665048:4294967294: fix? (y,n, or Y,N for all errors of this type) done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_alloc_to_lru_refs... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_snapshot_trees... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_snapshots... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_subvols... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_subvol_children... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): delete_dead_snapshots... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_root... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_subvolume_structure... done bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_directory_structure...unreachable inode u64s 16 type inode_v3 0:2903267:4294967293 len 0 ver 0: mode=100444 flags= (7300000) journal_seq=2666230 bi_size=260 bi_sectors=8 bi_version=0 bi_atime=1912211233956985 bi_ctime=2425523466637133 bi_mtime=16727570274002083825 bi_otime=1912189288589882 bi_uid=0 bi_gid=0 bi_nlink=1 bi_generation=0 bi_dev=0 bi_data_checksum=0 bi_compression=0 bi_project=0 bi_background_compression=0 bi_data_replicas=0 bi_promote_target=0 bi_foreground_target=0 bi_background_target=0 bi_erasure_code=0 bi_fields_set=0 bi_dir=0 bi_dir_offset=0 bi_subvol=0 bi_parent_subvol=0 bi_nocow=0 : fix? (y,n, or Y,N for all errors of this type) done ```though I'm unsure about what it tells me and - coming from btrfs - I'm very hesistant to let fsck touch anything. Should I just do
fix_errors
?