pop-os / default-settings

Distribution Default Settings
Other
44 stars 16 forks source link

vm.dirty_bytes Pop!OS customization trashes BTRFS performance #111

Open romen opened 3 years ago

romen commented 3 years ago

Distribution (run cat /etc/os-release):

cat /etc/os-release
NAME="Pop!_OS"
VERSION="20.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
LOGO=distributor-logo-pop-os

uname -a
Linux oryx 5.11.0-7612-generic #13~1617215757~20.04~97a8d1a-Ubuntu SMP Thu Apr 1 21:15:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

apt policy pop-default-settings 
pop-default-settings:
  Installed: 4.0.6~1611854075~20.04~6a2277e
  Candidate: 4.0.6~1611854075~20.04~6a2277e
  Version table:
 *** 4.0.6~1611854075~20.04~6a2277e 1001
       1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 Packages
       1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main i386 Packages
        100 /var/lib/dpkg/status

Issue/Bug Description:

Commit 6a2277e02efae1d3df642ae1cf26383e6e8a81f6 reports:

fix: Set reasonable size for dirty bytes parameters

The kernel default is to buffer up to 10% of system RAM before flushing writes to the disk, which is insane. By setting a reasonable number of bytes for the dirty_bytes parameter, we can avoid sending the system into OOM during a large file transfer.

https://lwn.net/Articles/572911/

diff --git a/etc/sysctl.d/10-pop-default-settings.conf b/etc/sysctl.d/10-pop-default-settings.conf
index 987317f..0430a48 100644
--- a/etc/sysctl.d/10-pop-default-settings.conf
+++ b/etc/sysctl.d/10-pop-default-settings.conf
@@ -1 +1,3 @@
 vm.swappiness = 10
+vm.dirty_bytes = 16777216
+vm.dirty_background_bytes = 4194304

Unfortunately this fix has the unintended side effect of completely trashing the performance of COW filesystems like BTRFS for regular use as rootfs/home on fast SSDs!

No penalty is observed when when writing large files to a BTRFS partition, but it has very negative effects on operations that do many small writes, like touching metadata on a btrfs receive operation or even just when writing a lot of small files (e.g. untarring a big archive with complex directory structure). It can take up to 20 times the wall-clock time of running the same operation commenting out this change (which reverts to the default vm.dirty_ratio =20 and vm.dirty_background_ratio = 10).

When using BTRFS as rootfs and home, this is even worse, as operations as simple as apt update (or packagekit doing it in the background for you), apt upgrade but also just firefox/chrome regular operation (which can do frequent writes to the local on disk cache) can result in freezes lasting from some seconds to a few minutes where the CPU is stuck in iowait and all processes on the scheduler waiting for kernel triggered IO-trashing to be over. Operations where the user is intentionally doing a lot of writes are even worse: compiling big projects, cloning a moderate or big git repo locally, using ccache become just unbearable!

My suggestion is to revert this change, or find a different compromise that manage to fix the occasional OOM problems writing big files to slow block devices, without making it impossible to do many small writes to fast devices.

The comments on the LWN article linked in the original commit are quite enlightening on the fact that similar problem on COW filesystems were anticipated following this path and that it might be difficult to strike a good balance without reworking the issue with actual kernel changes that would make these sysfs knobs superfluos.

Steps to reproduce (if you know):

  1. create a BTRFS partition on a fast SSD
  2. mount it (I am using options defaults,noatime,compress=zstd but they are not particularly relevant, you can test with or without)
  3. have separate terminals where you are running iotop and htop to examine CPU and IO utilization, alternatively you can also use sysstats to collect the data and visualize it afterwards
  4. time (tar -xpf some_large_and_complex_archive.tar --acls --xattrs -C /path/to/mountpoint ; sync )
  5. unmount the BTRFS partition
  6. sudo sysctl vm.dirty_ratio = 20; sudo sysctl vm.dirty_backgroud_ratio = 20;
  7. redo 1-4
  8. look at the difference between the spent time for the tar extraction in the 2 cases

Expected behavior:

Using Pop!OS on a BTRFS root filesystem should be usable, and its performance not crippled to avoid rare corner cases when writing large files to slow devices.

Other Notes:

My sample .tar to debug the performance issues I was seeing, that finally brought me to isolate commit 6a2277e02efae1d3df642ae1cf26383e6e8a81f6 as the root cause, was a backup of my old rootfs partition: it doesn;t need to be huge, anything that contains a lot of files, with a lot of associated metadata, will work. Actually the smaller the ratio between total archived data size and number of files and metadata, the more the difference should be visible.

clintar commented 2 years ago

Wow, glad I found this issue. Started happening to me after upgrading to PopOS 21.10. The experience has been HORRIBLE. I have a BTRFS system on bcache. Don't know if that combination made it worse than normal, but the system would become barely usable. I have NEVER had such a negative experience in linux (running for 15 years probably now). I would get stuck watching a youtube video where I couldn't even get out of the window with alt-tab or get out of full-screen for a long time when it would freeze. Sometimes just alt-tabbing between windows would just be stuck and I couldn't tell why. I eventually figured out I could unfreeze if I did the Magic Sysrq ALT-SYSRQ-S keyboard combination to sync the filesystem, so I figured it must be some kind of filesystem buffering problem. I would have to do this shortcut any time a youtube video would just freeze as I navigated around in it, or any web app that had would open a lot of connections. explorer.helium.com was one that would trigger this pretty often where the page just would freeze forever. I have another system using zfs that seems slow sometimes. I wonder if that one is being affected by this as well. It doesn't seem near as bad as this one has been.

This is TERRIBLE default behavior and not very obvious how to figure out where the issue is caused.

austinbutler commented 1 year ago

Seems this can be closed since https://github.com/pop-os/default-settings/pull/121 is merged.

techsy730 commented 1 year ago

It sort of sounds like the performance characteristics across different filesystems and disk types have diverged enough that a single, global value for vm.dirty_bytes and vm.dirty_background_bytes is no longer sufficient to meet rising real world use cases.

But that is a feature request for the Linux kernel project, not here. :wink:

ahydronous commented 1 month ago

Here is a script that manages most vm. settings intelligently. It might be worth getting in touch with the CachyOS people so you can work together on it, although since the script already exists it shouldn't be a terrible amount of work.

https://gitlab.com/cscs/maxperfwiz/-/blob/master/maxperfwiz?ref_type=heads