Open glommer opened 5 years ago
Read IOPS phase also shows a lot of iowait:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 6.50 0.00 0.08 0.00 25.85 0.00 0.46 0.46 0.00 0.31 0.20
sdb 0.00 0.00 2147.50 6.50 1073.75 0.20 1021.10 130.61 60.62 60.81 0.00 0.46 100.00
sdc 0.00 0.00 2120.50 0.00 1060.25 0.00 1024.00 53.07 25.12 25.12 0.00 0.47 98.80
sdd 0.00 0.00 2079.50 0.00 1039.75 0.00 1024.00 43.39 21.56 21.56 0.00 0.47 97.50
sde 0.00 0.00 2151.00 0.00 1075.50 0.00 1024.00 132.35 61.53 61.53 0.00 0.46 100.00
sdf 0.00 0.00 2144.00 0.00 1072.00 0.00 1024.00 86.82 40.83 40.83 0.00 0.47 100.00
dm-0 0.00 0.00 5.50 0.00 0.07 0.00 24.73 0.00 0.36 0.36 0.00 0.18 0.10
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 1.00 0.00 0.02 0.00 32.00 0.00 1.00 1.00 0.00 1.00 0.10
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 10537.00 7.00 5268.50 0.22 1023.36 0.00 0.00 0.00 0.00 0.00 0.00
Saw this today:
During Scylla operation, SSD disks are reasonably fast and displaying low latency:
However, IOTune produces really bad results for IOPS (bandwidth is fine)
As you can see, even at much higher request sizes we're doing more IOPS than that.
The output of iostat during the run is enlightening:
Write bandwidth phase:
I suspect that we are firing a bunch of requests and because we expect their latencies to be low we don't wait for them to complete. But if for some reason that's not the case then we'll taint the following phases. (unconfirmed) I/O wait is also quite high which could indicate we are blocking in the kernel by sending more requests than what the disk was configured for in
nr_requests
(but how?)disks were trimmed before this run to make sure that was not the case.