Open raldone01 opened 1 month ago
I'm seeing what seems to be the same issue, even though I'm not using nocow. There's a lot of both reads and writes happening to the drive, but interestingly minio is barely doing anything. I'm happy to provide more information as needed, but not quite sure which information is useful here.
[ 9462.931559] INFO: task minio:14020 blocked for more than 1228 seconds.
[ 9462.931587] Tainted: G W 6.9.0 #1-NixOS
[ 9462.931595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9462.931606] task:minio state:D stack:0 pid:14020 tgid:14014 ppid:1 flags:0x00000000
[ 9462.931609] Call Trace:
[ 9462.931611] <TASK>
[ 9462.931613] __schedule+0x3ec/0x1540
[ 9462.931619] ? bch2_btree_path_traverse_one+0x47d/0xb60 [bcachefs]
[ 9462.931674] ? bch2_btree_key_cache_find+0x181/0x1b0 [bcachefs]
[ 9462.931716] schedule+0x27/0xf0
[ 9462.931719] io_schedule+0x46/0x70
[ 9462.931720] folio_wait_bit_common+0x13f/0x340
[ 9462.931724] ? __pfx_wake_page_function+0x10/0x10
[ 9462.931728] folio_wait_writeback+0x2b/0x80
[ 9462.931730] truncate_inode_partial_folio+0x5b/0x190
[ 9462.931733] truncate_inode_pages_range+0x1de/0x400
[ 9462.931740] truncate_pagecache+0x47/0x60
[ 9462.931742] bchfs_truncate+0x159/0x3c0 [bcachefs]
[ 9462.931793] notify_change+0x1f2/0x4c0
[ 9462.931796] ? do_truncate+0x98/0xf0
[ 9462.931799] do_truncate+0x98/0xf0
[ 9462.931802] path_openat+0xf96/0x1150
[ 9462.931806] do_filp_open+0xc4/0x170
[ 9462.931810] do_sys_openat2+0xab/0xe0
[ 9462.931812] ? __x64_sys_epoll_pwait+0x95/0x140
[ 9462.931817] __x64_sys_openat+0x57/0xa0
[ 9462.931819] do_syscall_64+0xb8/0x200
[ 9462.931821] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 9462.931825] RIP: 0033:0x40708e
[ 9462.931841] RSP: 002b:000000c0020ceae8 EFLAGS: 00000206 ORIG_RAX: 0000000000000101
[ 9462.931843] RAX: ffffffffffffffda RBX: ffffffffffffff9c RCX: 000000000040708e
[ 9462.931844] RDX: 0000000000081241 RSI: 000000c08d3bd4f0 RDI: ffffffffffffff9c
[ 9462.931845] RBP: 000000c0020ceb28 R08: 0000000000000000 R09: 0000000000000000
[ 9462.931846] R10: 00000000000001b6 R11: 0000000000000206 R12: 000000c08d3bd4f0
[ 9462.931847] R13: 0000000000000000 R14: 000000c002169340 R15: 0fffffffffffffff
[ 9462.931850] </TASK>
EDIT(2024-05-19): I tried a couple things - did mount -t bcachefs -ofsck,fix_errors ...
on kernel 6.8.10, and downgraded the machine down to that kernel - the hangs are still present. The other thing I've done recently was adding the 3rd HDD, so I think this might be some balancing issues (I have data_replicas=2,metadata_replicas=2, so I assume some rebalancing needs to happen on writes)?
bcachefs fs usage /mnt/nas3
:
Filesystem: 8f552709-24e3-4387-8183-23878c94d00b
Size: 46521872338432
Used: 37175092178432
Online reserved: 3448832
Data type Required/total Durability Devices
btree: 1/2 2 [nvme1n1 nvme0n1] 257802371072
user: 1/2 2 [nvme0n1 sdb] 9871360
user: 1/2 2 [sdh sda] 1936855717376
user: 1/2 2 [sdg nvme0n1] 689984512
user: 1/2 2 [nvme1n1 sdb] 13164544
user: 1/2 2 [sdf sdi] 797582336
user: 1/2 2 [sdg sdi] 30356595159552
user: 1/2 2 [nvme1n1 sdf] 7528448
user: 1/2 2 [nvme0n1 sdf] 3940352
user: 1/2 2 [sdf sda] 1139384320
user: 1/2 2 [sda sdi] 12542515712
user: 1/2 2 [sdg sdh] 1108064192512
user: 1/2 2 [sdg sda] 9071634432
user: 1/2 2 [sdh nvme0n1] 37839261184
user: 1/2 2 [sdh sdi] 1100391740928
user: 1/2 2 [nvme1n1 sda] 7716864
user: 1/2 2 [nvme1n1 sdi] 331915776
user: 1/2 2 [nvme0n1 sda] 5980160
user: 1/2 2 [nvme0n1 sdi] 333169664
user: 1/2 2 [sdf sdb] 6447104
user: 1/2 2 [sda sdb] 903544832
user: 1/2 2 [sdb sdi] 13918672896
user: 1/2 2 [sdg nvme1n1] 685217792
user: 1/2 2 [sdg sdf] 764903936
user: 1/2 2 [sdg sdb] 9017686016
user: 1/2 2 [sdh nvme1n1] 37822008320
user: 1/2 2 [sdh sdf] 122288275456
user: 1/2 2 [sdh sdb] 1936656927744
user: 1/2 2 [nvme1n1 nvme0n1] 191724323840
cached: 1/1 1 [sdh] 1716577348096
hdd.hdd1 (device 0): sdg rw
data buckets fragmented
free: 250023510016 238441
sb: 3149824 4 1044480
journal: 8589934592 8192
btree: 0 0
user: 15742445061632 15013011
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 16000900661248 15259648
hdd.hdd2 (device 1): sdh rw
data buckets fragmented
free: 10990322188288 10481188
sb: 3149824 4 1044480
journal: 8589934592 8192
btree: 0 0
user: 3139965637120 2994506 1855488
cached: 1716577348096 1775750
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 8388608 8
capacity: 16000900661248 15259648
hdd.hdd4 (device 8): sdi rw
data buckets fragmented
free: 250021412864 238439
sb: 3149824 4 1044480
journal: 8589934592 8192
btree: 0 0
user: 15742447779328 15013013 290816
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 16000900661248 15259648
nvme.nvme1 (device 3): nvme1n1 rw
data buckets fragmented
free: 3899654144 7438
sb: 3149824 7 520192
journal: 1953497088 3726
btree: 128901185536 245870 5505024
user: 115295932928 219909
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 250059161600 476950
nvme.nvme2 (device 4): nvme0n1 rw
data buckets fragmented
free: 3892314112 7424
sb: 3149824 7 520192
journal: 1953497088 3726
btree: 128901185536 245870 5505024
user: 115303263744 219923
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 250059161600 476950
ssd.ssd1 (device 7): sdb rw
data buckets fragmented
free: 15642656768 29836
sb: 3149824 7 520192
journal: 4294967296 8192
btree: 0 0
user: 980263374848 1869704
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 1000204664832 1907739
ssd.ssd2 (device 6): sda rw
data buckets fragmented
free: 15642656768 29836
sb: 3149824 7 520192
journal: 4294967296 8192
btree: 0 0
user: 980263375872 1869704
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 1000204664832 1907739
ssd.ssd3 (device 5): sdf rw
data buckets fragmented
free: 1015021568 1936
sb: 3149824 7 520192
journal: 500170752 954
btree: 0 0
user: 62504042496 119217
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 64022904832 122114
sudo bcachefs show-super /dev/sda
:
Device: Samsung SSD 860
External UUID: 8f552709-24e3-4387-8183-23878c94d00b
Internal UUID: 51b7fa13-7ca1-44dc-9203-27fa8a2dc39f
Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index: 6
Label:
Version: 1.4: member_seq
Version upgrade complete: 1.4: member_seq
Oldest version on disk: 0.29: snapshot_trees
Created: Wed Nov 8 08:23:40 2023
Sequence number: 457
Time of last write: Sun May 19 15:42:14 2024
Superblock size: 9.70 KiB/1.00 MiB
Clean: 0
Devices: 8
Sections: members_v1,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
Features: zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Options:
block_size: 4.00 KiB
btree_node_size: 256 KiB
errors: continue [ro] panic
metadata_replicas: 2
data_replicas: 2
metadata_replicas_required: 1
data_replicas_required: 1
encoded_extent_max: 64.0 KiB
metadata_checksum: none [crc32c] crc64 xxhash
data_checksum: none [crc32c] crc64 xxhash
compression: zstd
background_compression: none
str_hash: crc32c crc64 [siphash]
metadata_target: nvme
foreground_target: nvme
background_target: hdd
promote_target: ssd
erasure_code: 0
inodes_32bit: 1
shard_inode_numbers: 1
inodes_use_key_cache: 1
gc_reserve_percent: 8
gc_reserve_bytes: 0 B
root_reserve_percent: 0
wide_macs: 0
acl: 1
usrquota: 0
grpquota: 0
prjquota: 0
journal_flush_delay: 1000
journal_flush_disabled: 0
journal_reclaim_delay: 100
journal_transaction_names: 1
version_upgrade: [compatible] incompatible none
nocow: 0
members_v2 (size 1240):
Device: 0
Label: hdd1 (1)
UUID: 521ccb40-ec62-4884-a0d9-1794b4e147f9
Size: 14.6 TiB
read errors: 96369047
write errors: 122672560
checksum errors: 259
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 15259648
Last mount: Sun May 19 15:41:41 2024
Last superblock write: 457
State: rw
Data allowed: journal,btree,user
Has data: journal,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 1
Label: hdd2 (2)
UUID: 0329232d-9462-4a3b-b8c9-3f5f53cb55b0
Size: 14.6 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 15259648
Last mount: Sun May 19 15:41:41 2024
Last superblock write: 457
State: rw
Data allowed: journal,btree,user
Has data: journal,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 3
Label: nvme1 (5)
UUID: 1649225b-4920-48c1-90e6-494a7c6136f1
Size: 233 GiB
read errors: 0
write errors: 0
checksum errors: 108
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 476950
Last mount: Sun May 19 15:41:41 2024
Last superblock write: 457
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 4
Label: nvme2 (6)
UUID: 3d764a09-f6a3-49b4-b649-e4b05102d6e3
Size: 233 GiB
read errors: 0
write errors: 0
checksum errors: 84
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 476950
Last mount: Sun May 19 15:41:41 2024
Last superblock write: 457
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 5
Label: ssd3 (8)
UUID: 992ca2ac-c9f7-4843-b4ae-10b90494486e
Size: 59.6 GiB
read errors: 0
write errors: 0
checksum errors: 502
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 122114
Last mount: Sun May 19 15:41:41 2024
Last superblock write: 457
State: rw
Data allowed: journal,btree,user
Has data: user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 6
Label: ssd2 (9)
UUID: 2d7e9ae0-ded9-4dab-9d72-91e7bf0b23a9
Size: 932 GiB
read errors: 0
write errors: 0
checksum errors: 860
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 1907739
Last mount: Sun May 19 15:41:41 2024
Last superblock write: 457
State: rw
Data allowed: journal,btree,user
Has data: user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 7
Label: ssd1 (10)
UUID: b83f5c9c-867f-49e3-a98b-c54efb1546d6
Size: 932 GiB
read errors: 0
write errors: 0
checksum errors: 1738
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 1907739
Last mount: Sun May 19 15:41:41 2024
Last superblock write: 457
State: rw
Data allowed: journal,btree,user
Has data: user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 8
Label: hdd4 (11)
UUID: e6dcdef9-a550-45c9-9b15-bc70c966749d
Size: 14.6 TiB
read errors: 0
write errors: 0
checksum errors: 17076
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 1.00 MiB
First bucket: 0
Buckets: 15259648
Last mount: Sun May 19 15:41:41 2024
Last superblock write: 457
State: rw
Data allowed: journal,btree,user
Has data: journal,user,cached
Durability: 1
Discard: 0
Freespace initialized: 1
errors (size 40):
deleted_inode_missing 3 Sat May 18 13:58:05 2024
unlinked_inode_not_on_deleted_list 5 Sun May 19 13:02:06 2024
EDIT(2024-05-19): I did an experiment and deleted a big file - the processes that were stuck got unstuck. I suspect they will come back to being stuck again, but hopefully this provides some fuel for investigation.
It just happened again.
This time it failed to mount.
I just added fsck,fix_errors
to hopefully recover the data and mount it again.
I am running:
❯ uname -a
Linux argon 6.8.9-arch1-2 #1 SMP PREEMPT_DYNAMIC Tue, 07 May 2024 21:35:54 +0000 x86_64 GNU/Linux
❯ cat /proc/cmdline
root=UUID=43650f9d-2143-4de8-b163-e0df92c6ebf0 rootflags=rw,noatime,compress=zstd:3,ssd,discard=async,space_cache=v2,autodefrag,subvolid=258,subvol=/@ rw loglevel=3 bgrt_disable quiet nvidia_drm.modeset=1 lsm=landlock,lockdown,yama,integrity,apparmor,bpf intel_iommu=on iommu=pt
My rootfs thankfully is btrfs so the machine booted fine. I will have to pause my heavy writes until this is fixed. :(
@ramonacat which kernel version did you downgrade to? Did you encounter any more hangs? I will try to remove the nocow attribute and see if that helps a bit.
I downgraded to 6.8.10 (from 6.9.0). But it does not seem to have changed the situation. The issue in my case seems to be that the filesystem gets stuck instead of moving buckets around to use the free space.
This seems to be very similar or the same I have with #677
I've only posted that on IRC already.
Regardless whether the filesystem / a folder is cow or nocow, regardless whether I'm using compression or not: As soon as I write something to the bcachefs drives, it goes slower and slower and eventually stalls.
In my case, whenever this happens, all writes to other filesystems are slow as well (around 70 - 300kB/s), so whatever bcachefs does, it's affecting the whole system.
It happened both on kernel 6.8 and 6.9. And I've now let it run for 5 days without writing to it, which works fine, so it definitely has something to do with writing.
As can be seen #677, it also happened to me after adding a new drive, and rebalancing has the same hangs, so this could be a pointer in the right direction.
I managed to temporarily work around this by adding a couple drives I had lying around to the array.,
I managed to temporarily work around this by adding a couple drives I had lying around to the array.,
Could you elaborate some more info? What drives? SSD? HDD? How many? Did you remove them afterwards again?
I can add some temporary external drives, but would like to remove them again after the issue is gone, that's why I'm asking :)
One SSD and one HDD, tho I don't think it matters. I don't think I can remove them, I think there's just some problem with allocations.
Quick update: I added a second 2TB SSD to my setup and it also resolved the issue. Weird :)
I don't have any nocow data, but I did try enabling compression attribute on a folder with replicas=1 and then upping replicas to 2, I don't know if that plays into it
With bcachefs-for-upstream v6.10-rc2-4-ga9cf489be39f I get this frequently.
[ 1330.210841] INFO: task bch-rebalance/3:1481 blocked for more than 1208 seconds.
[ 1330.211609] Tainted: G OE 6.10.0-rc2+ #2
[ 1330.212373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.213153] task:bch-rebalance/3 state:D stack:0 pid:1481 tgid:1481 ppid:2 flags:0x00004000
[ 1330.213956] Call Trace:
[ 1330.214745]
I had a borg cache directory on the bcachefs and a backup consistently triggered kernel thread timeouts (hung threads). Nocow did not seem to make a difference. I moved the cache to a different drive for now.
PS: Is there a way to remove nocow with bcachefs setattr
?
I noticed the whole fs was stuck and checked dmesg:
Bcachefs show-super:
Bcachefs fs usage:
It recovered fine after a forced reboot. Sadly the kernel is tainted due to nvidia.