Open nicman23 opened 3 years ago
I have the same observation, bch-rebalance only reads and doesn't do writes.
How much is it reading? More than just metadata?
I'll add a stat for "read but didn't write data"
on my array it caps on the read of a single hdd
also yes it only reads for me too.
also not sure if relevant to the issue but my system has 9G allocated out of 16G total RAM with basically nothing running - also no hugepages.
I see a lot of cache usage in gnome system monitor, perhaps that's what you're seeing as well? I know it only happens with bcachefs, but AFAIK every thing else takes precedence. It does some weird stuff with my graphics drivers, so I would suspect a possible memory leak or just overuse of kernel memory space.
i was not running a DE only getty at that time. it was not cache it was memory used from kernel processes.
Something I did notice @nicman23, is that we're both using a multi-device setup in a RAID 0-esque mode with metadata replicas at 2, and that we're both using NVMEs and HDDs in their own groups. (I'm using erasure_coding though) I don't have the problem you do with boot rebalances, and aside from some data loss/corruption from bad resets and fsck (just pressing y to everything), everything else has been stable. Note that my setup is only a few weeks old, I keep going back and forth from maining bcachefs and then not because of some bug. But I think the kernel memory usage/leak issue may be related.
also i did have checksums disabled for data and no replicas for data
so raid0 for data and raid1 for metadata
How much is it reading? More than just metadata?
I'll add a stat for "read but didn't write data"
Size: 1209119193600
Used: 1108305899520
Online reserved: 1048576
Data type Required/total Devices
btree: 1/2 [dm-6 dm-1] 12942049280
user: 1/1 [dm-6] 555509760
user: 1/1 [dm-1] 1094001182720
cached: 1/1 [dm-6] 85869757952
hdd (device 1): dm-1 rw
data buckets fragmented
sb: 135168 1 389120
journal: 536870912 1024 0
btree: 6471024640 19717 3866361856
user: 1094001182720 2095147 4459247616
cached: 0 0 0
parity: 0 0 0
erasure coded: 0 0 0
available: 97550598144 186063
capacity: 1206885810176 2301952
ssd (device 0): dm-6 rw
data buckets fragmented
sb: 135168 1 126976
journal: 268435456 1024 0
btree: 6471024640 24685 0
user: 555509760 2122 759808
cached: 85869757952 331466 1022065152
parity: 0 0 0
erasure coded: 0 0 0
available: 100078190592 381768
capacity: 107374182400 409600
# iostat -h /dev/dm-6 /dev/dm-1
Linux 5.10.0-00651-gc37f6ec112e9 (x) 16.04.2021 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
8,8% 0,0% 13,7% 7,3% 0,0% 70,2%
tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device
1463,94 25,6M 209,5k 0,0k 47,2G 385,6M 0,0k dm-1
205,33 5,0M 5,7k 988,8k 9,2G 10,5M 1,8G dm-6
Also noticed that "Online reserved" constantly changes, going up to e.g.3018752 and going down to e.g. 8192.
Also observing this on two fresh filesystems, one fs only on an SSD (64GiB), the other on the same SSD (192GiB)+an HDD (1.5TB).
Two kernel threads are ongoing, bch-rebalance/d
and bch-rebalance/f
, with ~250MB/s and ~70MB/s of reads, respectively. I have not observed any writes. In one instance, the slower rebalance thread finished earlier than the other.
contents of /sys/fs/bcachefs/*/internal/rebalance_*
:
```shell # for file in /sys/fs/bcachefs/*/internal/rebalance_*; do echo $file; cat $file; echo; done /sys/fs/bcachefs/eeee380c-a262-4d6d-89fa-4040a38172a2/internal/rebalance_enabled 1 /sys/fs/bcachefs/eeee380c-a262-4d6d-89fa-4040a38172a2/internal/rebalance_rate 200k /sys/fs/bcachefs/eeee380c-a262-4d6d-89fa-4040a38172a2/internal/rebalance_rate_bytes 204800 /sys/fs/bcachefs/eeee380c-a262-4d6d-89fa-4040a38172a2/internal/rebalance_rate_debug rate: 200k/sec target: 0 actual: 0 proportional: 0 derivative: 0 change: 0/sec next io: -14655776ms /sys/fs/bcachefs/eeee380c-a262-4d6d-89fa-4040a38172a2/internal/rebalance_rate_d_term 30 /sys/fs/bcachefs/eeee380c-a262-4d6d-89fa-4040a38172a2/internal/rebalance_rate_p_term_inverse 6000 /sys/fs/bcachefs/eeee380c-a262-4d6d-89fa-4040a38172a2/internal/rebalance_work fullest_dev (-1): 0/58.8G total work: 160k/58.8G rate: 204800 running pos 6917529027641096294:5347272:U32_MAX /sys/fs/bcachefs/fed3ec1f-9bdc-44fd-8569-02ab01841e2a/internal/rebalance_enabled 1 /sys/fs/bcachefs/fed3ec1f-9bdc-44fd-8569-02ab01841e2a/internal/rebalance_rate 22.2k /sys/fs/bcachefs/fed3ec1f-9bdc-44fd-8569-02ab01841e2a/internal/rebalance_rate_bytes 22755 /sys/fs/bcachefs/fed3ec1f-9bdc-44fd-8569-02ab01841e2a/internal/rebalance_rate_debug rate: 22.2k/sec target: 0 actual: 0 proportional: 0 derivative: 0 change: 0/sec next io: -14655788ms /sys/fs/bcachefs/fed3ec1f-9bdc-44fd-8569-02ab01841e2a/internal/rebalance_rate_d_term 30 /sys/fs/bcachefs/fed3ec1f-9bdc-44fd-8569-02ab01841e2a/internal/rebalance_rate_p_term_inverse 6000 /sys/fs/bcachefs/fed3ec1f-9bdc-44fd-8569-02ab01841e2a/internal/rebalance_work fullest_dev (-1): 0/1.4T total work: 5.0G/1.4T rate: 22755 running pos 4611686018427417644:0:U32_MAX ``` Right now, the larger fs has some slow writing going on, but the smaller fs is pretty much at rest.
There might also be a problem with fsck, when i ran that on the larger fs i think there was no output after the line about the fs being clean. And so i cancelled it after a minute or so. (i did not retry with current kernel and tools build)
Kernel is ac3ab6a511717db1644ded49a6f417304abba048, 5.11.21-1-bcachefs-git-323068-gac3ab6a51171
on Arch Linux (linux-bcachefs-git
from AUR)
Edit: i've found that the rebalance thread can come back without a reboot, though there was a suspend in between.
Still occuring on 8ea0358a147c09a608a83bff2beba9729525ccb5 and 05ed308a527fd595910dcfe5a67d9cac04490e37.
Would it be helpful to monitor some values over time? Some of the values coming out of sysfs already, even? I am already collecting some stuff from bcachefs into a prometheus+grafana setup, so that might be helpful in troubleshooting.
edit: i'm now blanket-collecting most numbers in sysfs
On each reboot the system takes a few hours in rebalance to stop having constant io. It is a newly formated fs but i did have some power failures (yay dead ups). Performance is ok but a bit concerning to have 3-4 hours of hdd io thrashing per day.
Not really sure what more you need to debug so please tell me.