koverstreet / bcachefs

Other
643 stars 71 forks source link

multiple performance degradations in the last 6 months #646

Open daduke opened 6 months ago

daduke commented 6 months ago

hey there,

we've been playing around with bcachefs for over a year as a possible future candidate for our multi-PB storage setup. We regularly compile upstream kernels and test tiered file system configurations. We always use the same disk layout and run the same fio performance test. Between August 2023 and today we've seen 2 significant performance degradations which effectively halved bcachefs' IOPS and throughput during that time. If this is to be expected since you're not optimizing for performance yet, please ignore and close this issue. If not, here's the data: the system is an old (2015ish) test file server with a 16T HDD HW RAID6 split into 5 volume sets (sda1 to sda5) and 2 380G caching SSDs (sdb and sdc) that we assemble in the following way:

bcachefs format --compression=lz4 --replicas=2 --label=hdd --durability=2 /dev/sda1 /dev/sda2 /dev/sda3 /dev/sda4 /dev/sda5 --label=ssd --durability=1 /dev/sdb /dev/sdc --foreground_target=ssd --promote_target=ssd --background_target=hdd --fs_label=data

On the resulting file system, we always run the same fio test:

fio --filename=randomrw --size=1GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=40 --time_based --group_reporting --name=iops-test-job --eta-newline=1 > output.txt

The kernel is always compiled on the same Debian Bookworm using BW's 6.1 .config + make oldconfig. Back in August 2023 we pulled the OOB bcachefs source and got

iops-test-job: (groupid=0, jobs=40): err= 0: pid=715781: Fri Aug 18 13:36:56 2023
  read: IOPS=27.5k, BW=107MiB/s (113MB/s)(12.6GiB/120006msec)

then with 6.7pre (as soon as the bcachefs source was upstreamed) it was

iops-test-job: (groupid=0, jobs=40): err= 0: pid=2702: Mon Nov 13 10:43:35 2023
  read: IOPS=22.2k, BW=86.6MiB/s (90.8MB/s)(10.1GiB/120004msec)

and now with 6.8rc2 it's

iops-test-job: (groupid=0, jobs=40): err= 0: pid=2050: Mon Jan 29 07:13:29 2024
  read: IOPS=14.0k, BW=54.7MiB/s (57.3MB/s)(6563MiB/120004msec)

The values are pretty consistent (+- 1 MB/s). We also see a performance drop if we create a bcachefs on just one SSD.

koverstreet commented 6 months ago

How much trouble would it be for you to bisect?

koverstreet commented 6 months ago

On rc2, the biggest change was that we switched to issuing flush ops correctly; that will have an impact.

We'll need to simplify the setup and establish a baseline; what performance are you seeing just testing on your SSD? And what is the SSD capable of?

daduke commented 6 months ago

How much trouble would it be for you to bisect?

I know it exists, haven't done it yet. I can only work on this on the side, so it would have to be largely automated...

daduke commented 6 months ago

We'll need to simplify the setup and establish a baseline; what performance are you seeing just testing on your SSD? And what is the SSD capable of?

as I said, I occasionally also tested on just one SSD and it got slower as well. I would presume everyone else would see a similar behavior (IIRC there has been talk about reduced performance on Phoronix when CONFIG_BCACHEFS_DEBUG was introduced).

koverstreet commented 6 months ago

I've reprod it; I'm seeing a 50% perf regression since 6.7 with random_writes, if I don't use no_data_io mode. Bisecting now.

koverstreet commented 6 months ago

The debugging option that Phoronix was testing with is no longer an issue - I fixed the performance overhead of that code, so it's now always on and the option has been removed.

daduke commented 6 months ago

The debugging option that Phoronix was testing with is no longer an issue - I fixed the performance overhead of that code, so it's now always on and the option has been removed.

I see. Good to know.

koverstreet commented 6 months ago

Can you give the bcachefs-testing branch a try? I just pushed a patch to improve journal pipelining; when testing 4k random writes with high iodepth, this is a drastic performance improvement - ~200k iops to 560k iops

daduke commented 6 months ago

so I compiled bcachefs-testing including 30792a137600d56957c2491a60879d5e95bbf1ef, but I'm afraid that doesn't make much of a difference:

iops-test-job: (groupid=0, jobs=40): err= 0: pid=867: Thu Feb  1 08:26:41 2024
  read: IOPS=14.6k, BW=57.1MiB/s (59.8MB/s)(6848MiB/120002msec)

vs

iops-test-job: (groupid=0, jobs=40): err= 0: pid=2050: Mon Jan 29 07:13:29 2024
  read: IOPS=14.0k, BW=54.7MiB/s (57.3MB/s)(6563MiB/120004msec)

on Monday

koverstreet commented 6 months ago

Hang on, I missed that you were testing reads. Is this random or sequential?

daduke commented 6 months ago

random:


fio --filename=randomrw --size=1GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=40 --time_based --group_reporting --name=iops-test-job --eta-newline=1
daduke commented 5 months ago

FYI I just recompiled bcachefs-v6.5 to make sure I can reproduce the older, faster numbers and get

iops-test-job: (groupid=0, jobs=40): err= 0: pid=856: Mon Feb  5 12:46:59 2024
  read: IOPS=23.0k, BW=89.9MiB/s (94.2MB/s)(10.6GiB/120343msec)
daduke commented 5 months ago

also: it seems the version of bcachefs-utils plays a role. I'm currently on kernel build bcachefs-v6.5 and first created my file system using bcache-utils from some time in August 2023 (just to go with the oldschool vibe). This resulted in

iops-test-job: (groupid=0, jobs=40): err= 0: pid=2076: Mon Feb  5 12:58:52 2024
  read: IOPS=24.0k, BW=93.6MiB/s (98.2MB/s)(11.0GiB/120352msec)

like above. When I create the same FS using bcachefs-utils HEAD, I get

  iops-test-job: (groupid=0, jobs=40): err= 0: pid=6455: Mon Feb  5 13:48:23 2024
  read: IOPS=21.9k, BW=85.7MiB/s (89.9MB/s)(10.1GiB/120298msec)

not a huge difference, but noticeable.

koverstreet commented 5 months ago

Did encodeded_extent_max change? or discard?

daduke commented 5 months ago

Did encodeded_extent_max change? or discard?

between the two bcachefs-utils versions you mean? Not unless the default changed, I always use the same parameters.

koverstreet commented 5 months ago

That's what I was asking - can you check the show-super output on your good and bad runs?

daduke commented 5 months ago

1.4.0:
encoded_extent_max:                       64.0 KiB
Discard:                                                0

v0.1-730-g28e6dea:
encoded_extent_max:                       64.0 KiB
Discard:                                                0
colttt commented 5 months ago

I guess it would be better if you post the whole show-super output

daduke commented 2 months ago

small update: 6.9 is back to 6.7 levels (even a bit higher), but still a good ways from August 2023.