koverstreet / bcachefs

Other
662 stars 70 forks source link

very bad write performance + kworker thread flush-bcachefs eats 100% CPU all the time [786558a] #600

Closed Mjasnik closed 10 months ago

Mjasnik commented 10 months ago

Overview At first I planned to use bcachefs on whole 2x3TB partitions in raid0, but I encountered severe performance issues with writing to disk. kworker thread flush-bcachefs uses 100% CPU almost all the time. The speed slowly declines from ~ 300MB/s to 15MB/s at the end. The computer is Ryzen 5800X3D, 48GB of RAM, source disk is NVME, so plenty of read speed.

Notes I have read that bcachefs is used with large DBs and so on and noone have really created yet bug reports about performance, maybe there is a configuration issue on my part, however the speed is bad.

Tests For testing the issue, I split each disk into 4 partitions each to test out various FS in raid0 to rule out disk issues. Since I performed them, here are the results. The folder I synced is filled with games, this means a lot of small files + a lot of big ones too.

I had set up raid0 (this is the order of partitions on disks):

I used the following commands to create raid:

sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/disk/by-partlabel/SEA1.MDA /dev/disk/by-partlabel/SEA2.MDA
sudo mkfs.ext4 -L BIGDATA2.MDA /dev/md0

sudo bcachefs format -L BIGDATA2.BCH --label=sea1.bch /dev/disk/by-partlabel/SEA1.BCH --label=sea2.bch /dev/disk/by-partlabel/SEA2.BCH

sudo mkfs.btrfs -L BIGDATA2.BTR -d raid0 -m raid0 -f /dev/disk/by-partlabel/SEA1.BTR /dev/disk/by-partlabel/SEA2.BTR

sudo zpool create -f BIGDATA2.ZFS -m /mnt/BIGDATA2.ZFS /dev/disk/by-partlabel/SEA1.ZFS /dev/disk/by-partlabel/SEA2.ZFS

I performed the same write test on every FS. MDADM

rsync --info=progress2 -a /mnt/DATA/WinePrefixes/* /mnt/BIGDATA2.MDA/ && time sync
243.381.299.244  99%  353,98MB/s    0:10:55 (xfr#119150, to-chk=0/136841)   

real    0m43,105s
user    0m0,000s
sys     0m0,005s

BCACHEFS

rsync --info=progress2 -a /mnt/DATA/WinePrefixes/* /mnt/BIGDATA2.BCH/ && time sync
243.381.299.244  99%   21,61MB/s    2:58:58 (xfr#119150, to-chk=0/136841)   

real    10m14,169s
user    0m0,001s
sys     0m0,004s

BTRFS

rsync --info=progress2 -a /mnt/DATA/WinePrefixes/* /mnt/BIGDATA2.BTR/ && time sync
243.381.299.244  99%  320,88MB/s    0:12:03 (xfr#119150, to-chk=0/136841)   

real    0m48,636s
user    0m0,000s
sys     0m0,004s

ZFS

rsync --info=progress2 -a /mnt/DATA/WinePrefixes/* /mnt/BIGDATA2.ZFS/ && time sync
243.381.299.244  99%  262,62MB/s    0:14:43 (xfr#119150, to-chk=0/136841)   

real    0m7,556s
user    0m0,000s
sys     0m0,014s

I performed the same read test on every FS.

dd if="/mnt/BIGDATA2.MDA/GOG.Galaxy/drive_c/Program Files (x86)/GOG Galaxy/Games/Cyberpunk 2077/archive/pc/content/basegame_4_appearance.archive" of=/dev/null bs=16M
967+1 records in
967+1 records out
16232243200 bytes (16 GB, 15 GiB) copied, 41,1594 s, 394 MB/s

dd if="/mnt/BIGDATA2.BCH/GOG.Galaxy/drive_c/Program Files (x86)/GOG Galaxy/Games/Cyberpunk 2077/archive/pc/content/basegame_4_appearance.archive" of=/dev/null bs=16M
967+1 records in
967+1 records out
16232243200 bytes (16 GB, 15 GiB) copied, 97,5869 s, 166 MB/s

dd if="/mnt/BIGDATA2.BTR/GOG.Galaxy/drive_c/Program Files (x86)/GOG Galaxy/Games/Cyberpunk 2077/archive/pc/content/basegame_4_appearance.archive" of=/dev/null bs=16M
967+1 records in
967+1 records out
16232243200 bytes (16 GB, 15 GiB) copied, 49,3771 s, 329 MB/s

dd if="/mnt/BIGDATA2.ZFS/GOG.Galaxy/drive_c/Program Files (x86)/GOG Galaxy/Games/Cyberpunk 2077/archive/pc/content/basegame_4_appearance.archive" of=/dev/null bs=16M
967+1 records in
967+1 records out
16232243200 bytes (16 GB, 15 GiB) copied, 61,8473 s, 262 MB/s

Version Commit hash 786558a, compiled on 18.10.2023 from master. I have tested the patch from https://github.com/CachyOS/kernel-patches/tree/master/6.5/misc, which I assume, at the time was the same version as of hash 786558a, there was no difference in performance.

Kernel options:

CONFIG_BCACHEFS_FS=y
CONFIG_BCACHEFS_QUOTA=y
CONFIG_BCACHEFS_POSIX_ACL=y
CONFIG_BCACHEFS_DEBUG_TRANSACTIONS=y
# CONFIG_BCACHEFS_DEBUG is not set
# CONFIG_BCACHEFS_TESTS is not set
# CONFIG_BCACHEFS_LOCK_TIME_STATS is not set
# CONFIG_BCACHEFS_NO_LATENCY_ACCT is not set

Generic info

bcachefs fs usage /mnt/BIGDATA2.BCH/
Filesystem: 2d0a8144-db02-4e5c-a1a2-5bb41c938c1d
Size:                  1350565888000
Used:                   252887220224
Online reserved:                   0

Data type       Required/total  Devices
btree:          1/1             [sdb2]                     335282176
btree:          1/1             [sda2]                     340262912
user:           1/1             [sdb2]                  121807831040
user:           1/1             [sda2]                  121806569472

sea1.bch (device 0):            sda2              rw
                                data         buckets    fragmented
  free:                            0         1158273
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                   340262912            1200     288882688
  user:                 121806569472          232328        212992
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             734003200000         1400000

sea2.bch (device 1):            sdb2              rw
                                data         buckets    fragmented
  free:                            0         1158282
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                   335282176            1189     288096256
  user:                 121807831040          232330
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             734003200000         1400000
sudo bcachefs show-super /dev/disk/by-partlabel/SEA1.BCH 
External UUID:                              2d0a8144-db02-4e5c-a1a2-5bb41c938c1d
Internal UUID:                              16af022e-dda1-4130-9510-270ba45c6b12
Device index:                               0
Label:                                      BIGDATA2.BCH
Version:                                    1.2: deleted_inodes
Version upgrade complete:                   1.2: deleted_inodes
Oldest version on disk:                     1.2: deleted_inodes
Created:                                    Wed Oct 18 21:11:25 2023
Sequence number:                            15
Superblock size:                            4880
Clean:                                      0
Devices:                                    2
Sections:                                   members_v1,replicas_v0,disk_groups,clean,journal_v2,counters,members_v2
Features:                                   new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               4.00 KiB
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic 
  metadata_replicas:                        1
  data_replicas:                            1
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none [crc32c] crc64 xxhash 
  data_checksum:                            none [crc32c] crc64 xxhash 
  compression:                              none
  background_compression:                   none
  str_hash:                                 crc32c crc64 [siphash] 
  metadata_target:                          none
  foreground_target:                        none
  background_target:                        none
  promote_target:                           none
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  version_upgrade:                          [compatible] incompatible none 
  nocow:                                    0

members_v2 (size 144):
  Device:                                   0
    UUID:                                   8ecf29a4-18de-4db7-a955-eb90ba6405e6
    Size:                                   684 GiB
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                1400000
    Last mount:                             Wed Oct 18 21:13:23 2023
    State:                                  rw
    Label:                                  bch (1)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user
    Discard:                                0
    Freespace initialized:                  0
  Device:                                   1
    UUID:                                   3c407281-7da3-4600-b3c8-bacb8b557db9
    Size:                                   684 GiB
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                1400000
    Last mount:                             Wed Oct 18 21:13:23 2023
    State:                                  rw
    Label:                                  bch (3)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user
    Discard:                                0
    Freespace initialized:                  0
koverstreet commented 10 months ago

This was likely the bug referenced here: https://lore.kernel.org/linux-bcachefs/20231019183803.njsjs4sz7p4zpyfc@moria.home.lan/T/#t

It's fixed in master now - please try on the latest version and reopen if you still have issues

Mjasnik commented 10 months ago

Thanks, it looks like this is fixed.

# BCACHEFS
rsync --info=progress2 -a /mnt/DATA/WinePrefixes/* /mnt/BIGDATA2.BCH/ && time sync
243.381.299.244  99%  303,95MB/s    0:12:43 (xfr#119150, to-chk=0/136841)   

real    0m46,078s
user    0m0,001s
sys     0m0,004s

Read speed improved a little too, still slowest of the 4 I tested, but improved.

dd if="/mnt/BIGDATA2.BCH/GOG.Galaxy/drive_c/Program Files (x86)/GOG Galaxy/Games/Cyberpunk 2077/archive/pc/content/basegame_4_appearance.archive" of=/dev/null bs=16M
967+1 records in
967+1 records out
16232243200 bytes (16 GB, 15 GiB) copied, 71,6984 s, 226 MB/s