sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
692 stars 48 forks source link

scx_rusty: Add iowait boosting #334

Closed hodgesds closed 3 weeks ago

hodgesds commented 1 month ago

This diff adds the option to do iowait boosting to temporarily increase the frequency of tasks that are waiting on IO. This can improve IO performance in some situations.

Test using fio for a two minute benchmark with iowait boosting enabled using schedutil frequency governor. The number of jobs was large to cause iowaits to occur.

$   fio --name=fiotest --filename=/home/hodgesd/test           --size=16Gb --rw=randrw --bs=4K                  --direct=1 --numjobs=80                             --ioengine=io_uring --iodepth=256                   --group_reporting --runtime=120       
fiotest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.35
Starting 80 processes
Jobs: 80 (f=80): [m(80)][100.0%][r=225MiB/s,w=221MiB/s][r=57.6k,w=56.6k IOPS][eta 00m:00s] 
fiotest: (groupid=0, jobs=80): err= 0: pid=715028: Thu Jun  6 06:21:38 2024
  read: IOPS=57.1k, BW=223MiB/s (234MB/s)(26.2GiB/120309msec)
    slat (usec): min=2, max=166417, avg=61.15, stdev=415.08
    clat (usec): min=2, max=1359.5k, avg=5605.83, stdev=26139.19
     lat (usec): min=47, max=1359.5k, avg=5666.98, stdev=26171.55
    clat percentiles (usec):
     |  1.00th=[   113],  5.00th=[   120], 10.00th=[   127], 20.00th=[   141],
     | 30.00th=[   167], 40.00th=[   208], 50.00th=[   260], 60.00th=[   375],
     | 70.00th=[   594], 80.00th=[  1172], 90.00th=[  4080], 95.00th=[ 18482],
     | 99.00th=[152044], 99.50th=[196084], 99.90th=[270533], 99.95th=[295699],
     | 99.99th=[383779]
   bw (  KiB/s): min= 5200, max=671479, per=100.00%, avg=231273.50, stdev=902.32, samples=19004
   iops        : min= 1300, max=167861, avg=57812.91, stdev=225.57, samples=19004
  write: IOPS=57.1k, BW=223MiB/s (234MB/s)(26.2GiB/120309msec); 0 zone resets
    slat (usec): min=2, max=73030, avg=54.44, stdev=152.66
    clat (usec): min=60, max=2237.9k, avg=353070.97, stdev=148785.60
     lat (usec): min=69, max=2238.1k, avg=353125.41, stdev=148779.58
    clat percentiles (msec):
     |  1.00th=[   22],  5.00th=[  109], 10.00th=[  288], 20.00th=[  326],
     | 30.00th=[  334], 40.00th=[  347], 50.00th=[  351], 60.00th=[  355],
     | 70.00th=[  363], 80.00th=[  372], 90.00th=[  397], 95.00th=[  493],
     | 99.00th=[ 1099], 99.50th=[ 1334], 99.90th=[ 1670], 99.95th=[ 1754],
     | 99.99th=[ 1938]
   bw (  KiB/s): min= 1816, max=643457, per=100.00%, avg=230493.75, stdev=832.66, samples=19005
   iops        : min=  454, max=160856, avg=57618.28, stdev=208.17, samples=19005
  lat (usec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
  lat (usec)   : 250=24.36%, 500=8.92%, 750=3.72%, 1000=2.04%
  lat (msec)   : 2=3.62%, 4=2.48%, 10=2.01%, 20=0.92%, 50=1.22%
  lat (msec)   : 100=2.14%, 250=2.85%, 500=43.31%, 750=1.44%, 1000=0.35%
  lat (msec)   : 2000=0.62%, >=2000=0.01%
  cpu          : usr=0.50%, sys=7.08%, ctx=12352202, majf=0, minf=898
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=6866435,6863977,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
   READ: bw=223MiB/s (234MB/s), 223MiB/s-223MiB/s (234MB/s-234MB/s), io=26.2GiB (28.1GB), run=120309-120309msec
  WRITE: bw=223MiB/s (234MB/s), 223MiB/s-223MiB/s (234MB/s-234MB/s), io=26.2GiB (28.1GB), run=120309-120309msec

With --iowait-boost disabled using schedutil:

  $ fio --name=fiotest --filename=/home/hodgesd/test           --size=16Gb --rw=randrw --bs=4K                  --direct=1 --numjobs=80                             --ioengine=io_uring --iodepth=256                   --group_reporting --runtime=120 
fiotest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.35
Starting 80 processes
Jobs: 80 (f=80): [m(80)][100.0%][r=227MiB/s,w=229MiB/s][r=58.1k,w=58.5k IOPS][eta 00m:00s] 
fiotest: (groupid=0, jobs=80): err= 0: pid=1136497: Thu Jun  6 08:19:31 2024
  read: IOPS=55.2k, BW=216MiB/s (226MB/s)(25.3GiB/120367msec)
    slat (usec): min=4, max=80962, avg=47.39, stdev=244.40
    clat (usec): min=3, max=1503.0k, avg=5667.63, stdev=28959.15
     lat (usec): min=50, max=1503.2k, avg=5715.03, stdev=28987.00
    clat percentiles (usec):
     |  1.00th=[   113],  5.00th=[   119], 10.00th=[   124], 20.00th=[   137],
     | 30.00th=[   153], 40.00th=[   186], 50.00th=[   223], 60.00th=[   281],
     | 70.00th=[   412], 80.00th=[   709], 90.00th=[  2573], 95.00th=[ 12649],
     | 99.00th=[170918], 99.50th=[217056], 99.90th=[312476], 99.95th=[375391],
     | 99.99th=[429917]
   bw (  KiB/s): min=12490, max=572415, per=100.00%, avg=225153.63, stdev=752.22, samples=18885
   iops        : min= 3122, max=143099, avg=56282.86, stdev=188.04, samples=18885
  write: IOPS=55.2k, BW=215MiB/s (226MB/s)(25.3GiB/120367msec); 0 zone resets
    slat (usec): min=2, max=57171, avg=45.53, stdev=133.07
    clat (usec): min=71, max=2564.2k, avg=365300.69, stdev=151792.16
     lat (usec): min=79, max=2564.2k, avg=365346.22, stdev=151788.93
    clat percentiles (msec):
     |  1.00th=[   26],  5.00th=[  178], 10.00th=[  313], 20.00th=[  321],
     | 30.00th=[  351], 40.00th=[  355], 50.00th=[  359], 60.00th=[  363],
     | 70.00th=[  376], 80.00th=[  384], 90.00th=[  405], 95.00th=[  435],
     | 99.00th=[ 1183], 99.50th=[ 1418], 99.90th=[ 1905], 99.95th=[ 2005],
     | 99.99th=[ 2232]
   bw (  KiB/s): min=23535, max=551627, per=100.00%, avg=225132.20, stdev=659.88, samples=18827
   iops        : min= 5882, max=137902, avg=56277.67, stdev=164.96, samples=18827
  lat (usec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
  lat (usec)   : 250=27.90%, 500=9.13%, 750=3.40%, 1000=1.59%
  lat (msec)   : 2=2.39%, 4=1.79%, 10=1.43%, 20=0.59%, 50=0.90%
  lat (msec)   : 100=1.57%, 250=2.41%, 500=45.01%, 750=1.00%, 1000=0.14%
  lat (msec)   : 2000=0.70%, >=2000=0.03%
  cpu          : usr=0.47%, sys=5.93%, ctx=12514903, majf=0, minf=1045
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=6641337,6640385,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=216MiB/s (226MB/s), 216MiB/s-216MiB/s (226MB/s-226MB/s), io=25.3GiB (27.2GB), run=120367-120367msec
  WRITE: bw=215MiB/s (226MB/s), 215MiB/s-215MiB/s (226MB/s-226MB/s), io=25.3GiB (27.2GB), run=120367-120367msec

Compared to CFS with performance governor:

  $ fio --name=fiotest --filename=/home/hodgesd/test           --size=16Gb --rw=randrw --bs=4K                  --direct=1 --numjobs=80                             --ioengine=io_uring --iodepth=256                   --group_reporting --runtime=120 
fiotest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.35
Starting 80 processes
Jobs: 80 (f=80): [m(80)][100.0%][r=117MiB/s,w=141MiB/s][r=30.0k,w=36.2k IOPS][eta 00m:00s] 
fiotest: (groupid=0, jobs=80): err= 0: pid=1153692: Thu Jun  6 08:22:06 2024
  read: IOPS=57.9k, BW=226MiB/s (237MB/s)(26.6GiB/120310msec)
    slat (usec): min=2, max=216154, avg=82.25, stdev=611.70
    clat (nsec): min=1225, max=1367.7M, avg=5160585.01, stdev=26476262.45
     lat (usec): min=53, max=1367.8k, avg=5242.83, stdev=26513.47
    clat percentiles (usec):
     |  1.00th=[   110],  5.00th=[   117], 10.00th=[   123], 20.00th=[   139],
     | 30.00th=[   163], 40.00th=[   204], 50.00th=[   251], 60.00th=[   371],
     | 70.00th=[   627], 80.00th=[  1532], 90.00th=[  5669], 95.00th=[ 14353],
     | 99.00th=[139461], 99.50th=[200279], 99.90th=[320865], 99.95th=[367002],
     | 99.99th=[534774]
   bw (  KiB/s): min= 4848, max=635951, per=100.00%, avg=234981.59, stdev=831.50, samples=18957
   iops        : min= 1212, max=158981, avg=58741.70, stdev=207.87, samples=18957
  write: IOPS=57.9k, BW=226MiB/s (237MB/s)(26.6GiB/120310msec); 0 zone resets
    slat (usec): min=2, max=217221, avg=43.81, stdev=409.55
    clat (usec): min=54, max=2646.6k, avg=348437.89, stdev=152722.11
     lat (usec): min=60, max=2646.8k, avg=348481.70, stdev=152720.03
    clat percentiles (msec):
     |  1.00th=[   29],  5.00th=[  182], 10.00th=[  296], 20.00th=[  317],
     | 30.00th=[  330], 40.00th=[  334], 50.00th=[  338], 60.00th=[  342],
     | 70.00th=[  351], 80.00th=[  359], 90.00th=[  380], 95.00th=[  439],
     | 99.00th=[ 1183], 99.50th=[ 1401], 99.90th=[ 1787], 99.95th=[ 1888],
     | 99.99th=[ 2123]
   bw (  KiB/s): min= 8816, max=602228, per=100.00%, avg=235681.63, stdev=745.35, samples=18865
   iops        : min= 2204, max=150554, avg=58916.80, stdev=186.34, samples=18865
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.01%, 250=24.92%, 500=8.13%, 750=3.21%, 1000=1.58%
  lat (msec)   : 2=3.15%, 4=2.78%, 10=3.20%, 20=1.37%, 50=1.28%
  lat (msec)   : 100=1.32%, 250=2.15%, 500=45.10%, 750=0.81%, 1000=0.21%
  lat (msec)   : 2000=0.78%, >=2000=0.01%
  cpu          : usr=0.39%, sys=5.70%, ctx=12690385, majf=1, minf=977
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=6963793,6961887,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=226MiB/s (237MB/s), 226MiB/s-226MiB/s (237MB/s-237MB/s), io=26.6GiB (28.5GB), run=120310-120310msec
  WRITE: bw=226MiB/s (237MB/s), 226MiB/s-226MiB/s (237MB/s-237MB/s), io=26.6GiB (28.5GB), run=120310-120310msec

The test results aren't super conclusive, but there are many factors such as the latency of the disk(s) that make this hard to test. I'll try to tests on some other disks/filesystems.

This idea is borrowed from this patchset, but a far more crude implementation.

hodgesds commented 3 weeks ago

doesn't seem to provide a measurable improvement so closing for now.