Closed redtex closed 6 years ago
Would there be a change if you go below 1/2 (half) of your RAM size with ARC ?
What are the other settings of your pools and zvols ? (compression ? noatime ? xattrs ?)
@redtex I suspect your device mapper devices appear to be non-rotational which is causing 057b87c to launch [EDIT] all all "PRIORITY_SYNC" IO synchronously. Check out the values of /sys/block/dm-X/queue/rotational
. If they're all zero, that's your problem.
It looks like you can poke a 1 into those files (echo 1 > /sys/block/dm-X/queue/rotational) before importing to pool which might fix the problem if so.
Hi !!! Thank you for fast answer !! In general, all zvols has same properties, except volblocksize, which may be 4k, 32k and 128k
# zfs get all sas/vm-301-disk-1
NAME PROPERTY VALUE SOURCE
sas/vm-301-disk-1 type volume -
sas/vm-301-disk-1 creation Fri May 8 18:45 2015 -
sas/vm-301-disk-1 used 25.4G -
sas/vm-301-disk-1 available 1.68T -
sas/vm-301-disk-1 referenced 25.4G -
sas/vm-301-disk-1 compressratio 1.14x -
sas/vm-301-disk-1 reservation none default
sas/vm-301-disk-1 volsize 32.0G local
sas/vm-301-disk-1 volblocksize 128K -
sas/vm-301-disk-1 checksum on default
sas/vm-301-disk-1 compression lz4 inherited from sas
sas/vm-301-disk-1 readonly off default
sas/vm-301-disk-1 copies 1 default
sas/vm-301-disk-1 refreservation none received
sas/vm-301-disk-1 primarycache all inherited from sas
sas/vm-301-disk-1 secondarycache all default
sas/vm-301-disk-1 usedbysnapshots 29.3M -
sas/vm-301-disk-1 usedbydataset 25.4G -
sas/vm-301-disk-1 usedbychildren 0 -
sas/vm-301-disk-1 usedbyrefreservation 0 -
sas/vm-301-disk-1 logbias latency inherited from sas
sas/vm-301-disk-1 dedup off default
sas/vm-301-disk-1 mlslabel none default
sas/vm-301-disk-1 sync standard inherited from sas
sas/vm-301-disk-1 refcompressratio 1.13x -
sas/vm-301-disk-1 written 46.3M -
sas/vm-301-disk-1 logicalused 29.1G -
sas/vm-301-disk-1 logicalreferenced 28.8G -
sas/vm-301-disk-1 snapshot_limit none default
sas/vm-301-disk-1 snapshot_count none default
sas/vm-301-disk-1 snapdev hidden default
sas/vm-301-disk-1 context none default
sas/vm-301-disk-1 fscontext none default
sas/vm-301-disk-1 defcontext none default
sas/vm-301-disk-1 rootcontext none default
sas/vm-301-disk-1 redundant_metadata all default
Have set ARC to 5G - same behavior. My device mapper devices are multipath SAS disks, each of them connected via two independent expanders. So they have already:
# cat /sys/block/dm-9/queue/rotational
1
# cat /sys/block/dm-10/queue/rotational
1
# cat /sys/block/dm-11/queue/rotational
1
# cat /sys/block/dm-12/queue/rotational
1
# cat /sys/block/dm-13/queue/rotational
1
comment below...
On Oct 2, 2015, at 4:34 AM, redtex notifications@github.com wrote:
Hi !!! On a host - Centos 7.1 3.10.0-229.14.1.el7.x86_64, 32G RAM, ZoL 0.6.5.2 - which serves VM images in zvols via iSCSI (SCST), there is very strange situation: After upgrade from 0.6.4 to 0.6.5 I noticed a significant performance drop - which seems like near 100% disks busy in iostat. It looks like:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 2.00 169.00 79.00 6861.00 81.17 0.20 1.19 13.00 1.05 1.17 20.00 dm-10 0.00 0.00 0.00 51.00 0.00 752.00 29.49 0.09 1.78 0.00 1.78 1.76 9.00 dm-11 0.00 0.00 2.00 168.00 11.50 6861.00 80.85 0.21 1.25 6.00 1.20 1.24 21.00 dm-12 0.00 0.00 2.00 221.00 81.00 1421.50 13.48 0.44 1.96 11.50 1.87 1.95 43.50 dm-13 0.00 0.00 1.00 290.00 82.00 1793.50 12.89 0.43 1.47 8.00 1.45 1.47 42.90
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-10 0.00 0.00 0.00 56.00 0.00 2424.00 86.57 0.19 3.34 0.00 3.34 1.64 9.20 dm-11 0.00 0.00 1.00 0.00 6.50 0.00 13.00 0.01 9.00 9.00 0.00 9.00 0.90 dm-12 0.00 0.00 0.00 414.00 0.00 2600.50 12.56 1.00 2.42 0.00 2.42 2.40 99.20 dm-13 0.00 0.00 0.00 483.00 0.00 2432.50 10.07 0.99 2.07 0.00 2.07 2.04 98.50
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 1.00 0.00 6.50 0.00 13.00 0.01 11.00 11.00 0.00 11.00 1.10 dm-10 0.00 0.00 0.00 27.00 0.00 1224.00 90.67 0.11 4.22 0.00 4.22 1.85 5.00 dm-11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-12 0.00 0.00 2.00 495.00 2.50 1551.50 6.25 1.18 2.31 86.00 1.97 2.00 99.50 dm-13 0.00 0.00 0.00 489.00 0.00 2190.00 8.96 1.12 2.01 0.00 2.01 2.03 99.20
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 1.00 0.00 7.00 0.00 14.00 0.01 8.00 8.00 0.00 8.00 0.80 dm-10 0.00 0.00 0.00 60.00 0.00 604.00 20.13 0.10 1.72 0.00 1.72 1.68 10.10 dm-11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-12 0.00 0.00 2.00 366.00 12.50 1519.50 8.33 1.28 3.55 137.00 2.82 2.70 99.30 dm-13 0.00 0.00 0.00 402.00 0.00 1290.50 6.42 1.98 2.49 0.00 2.49 2.49 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 3.00 49.00 10.00 506.50 19.87 0.11 2.12 3.00 2.06 2.06 10.70 dm-10 0.00 0.00 0.00 83.00 0.00 912.00 21.98 0.16 1.92 0.00 1.92 1.64 13.60 dm-11 0.00 0.00 3.00 51.00 25.50 506.50 19.70 0.12 2.30 13.67 1.63 2.26 12.20 dm-12 0.00 0.00 1.00 366.00 1.00 1622.50 8.85 0.79 2.16 10.00 2.14 2.12 77.90 dm-13 0.00 0.00 1.00 198.00 53.00 1009.00 10.67 0.61 8.69 1226.00 2.55 3.02 60.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 4.00 159.00 22.50 6461.50 79.56 0.49 3.01 77.75 1.13 2.23 36.30 dm-10 0.00 0.00 0.00 183.00 0.00 10648.00 116.37 1.58 8.63 0.00 8.63 1.72 31.50 dm-11 0.00 0.00 4.00 154.00 15.00 6461.50 81.98 0.24 1.52 13.00 1.22 1.47 23.30 dm-12 0.00 0.00 4.00 284.00 164.00 2629.50 19.40 0.69 2.36 62.50 1.51 1.53 44.00 dm-13 0.00 0.00 2.00 279.00 48.50 2016.00 14.69 0.52 1.62 13.50 1.53 1.57 44.20
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 1.00 0.00 1.00 0.00 2.00 0.02 23.00 23.00 0.00 23.00 2.30 dm-10 0.00 0.00 0.00 41.00 0.00 280.00 13.66 0.07 1.83 0.00 1.83 1.80 7.40 dm-11 0.00 0.00 1.00 0.00 1.50 0.00 3.00 0.01 8.00 8.00 0.00 8.00 0.80 dm-12 0.00 0.00 1.00 430.00 128.00 2956.50 14.31 2.11 2.36 14.00 2.33 2.31 99.60 dm-13 0.00 0.00 0.00 473.00 0.00 2972.00 12.57 2.84 2.10 0.00 2.10 2.11 100.00 and zpool iostat is
capacity operations bandwidth
pool alloc free read write read write
sas 1.83T 1.79T 5 85 14.5K 4.67M mirror 938G 918G 0 0 0 0 35000cca02827b824 - - 0 0 0 0 35000cca02827b8c4 - - 0 0 0 0 mirror 938G 918G 5 0 14.5K 0 35000cca02827bb30 - - 3 0 12.5K 0 35000cca02827d228 - - 1 0 2.00K 0 logs - - - - - - 35000c5003330fa5b 50.7M 278G 0 85 0 4.67M
capacity operations bandwidth
pool alloc free read write read write
sas 1.83T 1.79T 0 1.33K 0 8.23M mirror 938G 918G 0 311 0 1.68M 35000cca02827b824 - - 0 301 0 1.68M 35000cca02827b8c4 - - 0 400 0 2.20M mirror 938G 918G 0 1.00K 0 6.26M 35000cca02827bb30 - - 0 146 0 6.26M 35000cca02827d228 - - 0 146 0 6.26M logs - - - - - - 35000c5003330fa5b 50.7M 278G 0 21 0 296K
capacity operations bandwidth
pool alloc free read write read write
sas 1.83T 1.79T 0 490 18.0K 2.52M mirror 938G 918G 0 481 0 2.35M 35000cca02827b824 - - 0 470 0 2.35M 35000cca02827b8c4 - - 0 470 0 2.22M
Average I/O size seems to be around 5k, confirmed by iostat data above. This usually implies one of two conditions:
mirror 938G 918G 0 0 18.0K 0 35000cca02827bb30 - - 0 0 18.0K 0 35000cca02827d228 - - 0 0 0 0 logs - - - - - - 35000c5003330fa5b 50.7M 278G 0 8 0 180K
capacity operations bandwidth
pool alloc free read write read write
sas 1.83T 1.79T 9 408 61.5K 5.50M mirror 938G 918G 6 358 45.0K 1.78M 35000cca02827b824 - - 1 354 11.0K 1.78M 35000cca02827b8c4 - - 4 346 34.0K 2.14M mirror 938G 918G 2 0 16.5K 0 35000cca02827bb30 - - 1 0 11.0K 0 35000cca02827d228 - - 0 0 5.50K 0 logs - - - - - - 35000c5003330fa5b 50.7M 278G 0 49 0 3.72M
So, it's clearly seen, that one of mirrors has many times lower size of IO (avgrq-sz) which leads to huge performance drop. And I noticed, that such behavior starts after some significant time of work (several hours). After system reboot. The ARC size is 20G.
This has nothing to do with the ARC, so you can look elsewhere.
It is not clear, from data provided, what the sample interval is. If the sample interval is small, say 1 second, then this data looks about right for 5K average I/O size. ZFS will write about 1MB to a top-level vdev before switching to the next, so you can easily get samples where all of the writes go to one or the other. If the sample period is large, say 100 seconds, then we'd expect the balance to even out. Judging by the space allocated, the stripes are balanced. -- richard
The sample interval is 1 second. This looks like 5 second cycle: on the both mirrors data writes simultaneously but with different IO size, so on first mirror data are written within 1 second, and on another within 4-5 seconds
So, what advise will be ? To downgrade to 0.6.4.2 ?
Upd: so, I discovered with help of iotop, that those 5-second cycle is txg_sync process, which flushes async_writes to discs. But when I try to strace it - I get error:
# strace -p 15190
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Maybe I can somehow get debug info by other way ? Or it will be better to downgrade, and just wait......
@redtex these are kernel threads so you can't follow them with strace. My suggestion would be to first roll back to 0.6.4.2 and characterize the behavior there. Then we'll have a much better idea how it's changed and why that might be.
Is it possible, to use pool upgraded to 0.6.5 - features large_blocks and filesystem_limits are enabled, but not used - with zfs 0.6.4 ?? Zpool imports such pool, but I'm afraid to broke data.
If you're able to import the pool r/w then it's safe to write to the pool.
Hi !!! Before all, I want to remind, that uneven write IO size begins after some warm-up time - one or two hours after system boot. The workload is servicg VM images via iSCSI - SCST 3.1 target. Here what I've done: Before all, I've replaced rotational log device onto SSD log & cache (the same device partitioned). It was done on the fly, without service downtime. Before this operation I saw just what I expect - uneven IO size between mirrors. And just after replacement, and for some time later - I can't be sure, but I think it's about an hour, or little more - writes IO become uneven again. But after that, adding/removing log & cache separately or all together don't take this effect again. Veeery strange..... So, early morning I've downgraded ZFS from 0.6.5.2 to 0.6.4.2 and have just almost what expected: evenly distributed IO size between mirrors, or if it's not evenly but not more than two times difference, and not for a long. See 'iostat -d -x 1':
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 0.00 241.00 0.00 6303.50 52.31 0.52 2.17 0.00 2.17 2.06 49.60
dm-3 0.00 0.00 2.00 152.00 31.50 8010.50 104.44 0.29 1.91 10.50 1.80 1.81 27.80
dm-4 0.00 0.00 0.00 150.00 0.00 8010.50 106.81 0.41 2.75 0.00 2.75 2.71 40.60
dm-8 0.00 0.00 0.00 246.00 0.00 6303.50 51.25 0.45 1.83 0.00 1.83 1.72 42.40
dm-13 0.00 0.00 0.00 55.00 0.00 532.00 19.35 0.01 0.11 0.00 0.11 0.11 0.60
dm-15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 1.00 0.00 22.00 0.00 44.00 0.02 18.00 18.00 0.00 18.00 1.80
dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-13 0.00 0.00 0.00 75.00 0.00 796.00 21.23 0.01 0.12 0.00 0.12 0.12 0.90
dm-15 0.00 0.00 0.00 182.00 0.00 12838.00 141.08 0.04 0.22 0.00 0.22 0.22 4.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-13 0.00 0.00 0.00 48.00 0.00 408.00 17.00 0.01 0.10 0.00 0.10 0.10 0.50
dm-15 0.00 0.00 1.00 0.00 26.00 0.00 52.00 0.00 1.00 1.00 0.00 1.00 0.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-13 0.00 0.00 0.00 12.00 0.00 96.00 16.00 0.00 0.17 0.00 0.17 0.17 0.20
dm-15 0.00 0.00 3.00 0.00 12.00 0.00 8.00 0.00 0.33 0.33 0.00 0.33 0.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-13 0.00 0.00 0.00 49.00 0.00 608.00 24.82 0.01 0.14 0.00 0.14 0.14 0.70
dm-15 0.00 0.00 5.00 0.00 172.00 0.00 68.80 0.00 0.20 0.20 0.00 0.20 0.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 5.00 243.00 377.00 7600.00 64.33 0.40 1.63 16.20 1.33 1.42 35.30
dm-3 0.00 0.00 0.00 155.00 0.00 7704.00 99.41 0.27 1.71 0.00 1.71 1.59 24.60
dm-4 0.00 0.00 0.00 159.00 0.00 7704.00 96.91 0.37 2.35 0.00 2.35 2.33 37.00
dm-8 0.00 0.00 3.00 242.00 67.00 7600.00 62.59 0.49 2.01 42.67 1.50 1.66 40.60
dm-13 0.00 0.00 0.00 44.00 0.00 576.00 26.18 0.01 0.18 0.00 0.18 0.18 0.80
dm-15 0.00 0.00 4.00 135.00 208.00 6497.00 96.47 0.03 0.18 0.00 0.19 0.18 2.50
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-13 0.00 0.00 0.00 39.00 0.00 404.00 20.72 0.01 0.21 0.00 0.21 0.21 0.80
dm-15 0.00 0.00 0.00 49.00 0.00 2725.50 111.24 0.01 0.24 0.00 0.24 0.24 1.20
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-13 0.00 0.00 0.00 8.00 0.00 56.00 14.00 0.00 0.25 0.00 0.25 0.25 0.20
dm-15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 1.00 0.00 1.50 0.00 3.00 0.01 14.00 14.00 0.00 14.00 1.40
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-8 0.00 0.00 1.00 0.00 35.50 0.00 71.00 0.02 17.00 17.00 0.00 17.00 1.70
dm-13 0.00 0.00 0.00 23.00 0.00 844.00 73.39 0.00 0.17 0.00 0.17 0.17 0.40
dm-15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-13 0.00 0.00 0.00 106.00 0.00 4680.00 88.30 0.04 0.42 0.00 0.42 0.12 1.30
dm-15 0.00 0.00 2.00 0.00 8.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
Where dm-2 and dm-8 one mirror, dm-3 and dm-8 other mirror, dm-13 log, dm-15 cache 'zpool iostat -v sas 1':
capacity operations bandwidth
pool alloc free read write read write
--------------------- ----- ----- ----- ----- ----- -----
sas 1.83T 1.79T 58 1.73K 80.4K 7.16M
mirror 939G 917G 24 774 36.5K 3.20M
35000cca02827b824 - - 10 100 12.5K 3.20M
35000cca02827b8c4 - - 12 99 24.0K 3.20M
mirror 939G 917G 33 980 44.0K 3.68M
35000cca02827bb30 - - 11 118 17.0K 3.68M
35000cca02827d228 - - 11 110 31.5K 3.68M
logs - - - - - -
35000cca04d0f99c0p1 109M 3.61G 0 14 0 288K
cache - - - - - -
35000cca04d0f99c0p3 31.5G 24.4G 0 0 0 0
--------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool alloc free read write read write
--------------------- ----- ----- ----- ----- ----- -----
sas 1.83T 1.79T 5 12 496K 264K
mirror 939G 917G 1 0 118K 0
35000cca02827b824 - - 0 0 86.5K 0
35000cca02827b8c4 - - 0 0 31.5K 0
mirror 939G 917G 3 0 378K 0
35000cca02827bb30 - - 2 0 287K 0
35000cca02827d228 - - 0 0 91.4K 0
logs - - - - - -
35000cca04d0f99c0p1 109M 3.61G 0 12 0 264K
cache - - - - - -
35000cca04d0f99c0p3 31.5G 24.4G 4 122 430K 3.28M
--------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool alloc free read write read write
--------------------- ----- ----- ----- ----- ----- -----
sas 1.83T 1.79T 7 10 10.5K 192K
mirror 939G 917G 0 0 4.00K 0
35000cca02827b824 - - 0 0 0 0
35000cca02827b8c4 - - 0 0 4.00K 0
mirror 939G 917G 6 0 6.49K 0
35000cca02827bb30 - - 2 0 3.00K 0
35000cca02827d228 - - 3 0 3.50K 0
logs - - - - - -
35000cca04d0f99c0p1 109M 3.61G 0 10 0 192K
cache - - - - - -
35000cca04d0f99c0p3 31.5G 24.4G 0 16 0 486K
--------------------- ----- ----- ----- ----- ----- -----
Regards, Wadim.
@behlendorf there is any ideas ? Do I need to provide other details/logs ?
@redtex Wandering back into this issue due to the #4512 reference. I've reviewed your last iostat output and it certainly does show a difference between the 2 top-level mirror vdevs. By any chance, was this pool originally created with a single mirror and then later the second mirror added? Although an earlier zpool iostat -v
did show them to be equally full, there could be a whole lot more fragmentation in one versus the other especially if they weren't added to the pool at the same time. Another thing that can impact fragmentation is growing the vdevs; when they're grown, the system creates new metaslabs which start out as completely unfragmented. This is another type of problem in which the new zpool iostat
features of 0.7.0 could help quite a bit.
@redtex One other thing to check is that both your top-level vdevs have the same ashift. You can run zdb -l /dev/disk/by-XXX/<whatever> | grep ashift
(possibly partition 1 if full disk) on each disk to make sure.
@dweeezil Yes, pool was originally created with two mirrored vdevs. Vdevs consists of four same SAS 512 bytes/sector disks, so ashift for each vdev is 9.
Today, I've discovered how to reproduce this issue:
Configuration: 2 core kvm virtual machine with 4Gb RAM, 4 physical disks (WD Raptor) passed-through with SCSI-virtio CentOS 7 kernel 3.10.0-327.36.1.el7.x86_64
zfs non-default tunables: zfs_vdev_aggregation_limit=524288 Honestly, I think that this tunable is worthless for issue, but anyway, it was set, so I post it here.
First, create a fresh mirrored pool with 2 vdevs:
# zpool create -f tank mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde
Second, create zvol:
# zfs create -b 4k -V 10G -o refreservation=none -o compress=off -o primarycache=metadata tank/zvol4k-fiotest
Third, fill zvol with random data:
# dd if=/dev/zero bs=1M | openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt | dd of=/dev/tank/zvol4k-fiotest bs=1M
Prepare a fio job file:
# vi ./fio_pattern.4k
[global]
filename=/dev/tank/zvol4k-fiotest
ioengine=libaio
io_submit_mode=offload
direct=1
buffered=0
buffer_compress_percentage=0
refill_buffers=1
runtime=300
[4kRead]
blocksize=4k
readwrite=randread
iodepth=120
[4kWrite]
blocksize=4k
readwrite=randwrite
iodepth=120
Run fio test:
fio fio_pattern.4k
results for ZoL 0.6.4.2
4kRead: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
4kWrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
fio-2.2.8
Starting 2 processes
^Cbs: 2 (f=2): [r(1),w(1)] [6.2% done] [5412KB/8656KB/0KB /s] [1353/2164/0 iops] [eta 29m:39s]
fio: terminating on signal 2
4kRead: (groupid=0, jobs=1): err= 0: pid=1839: Thu Oct 6 11:21:58 2016
read : io=647520KB, bw=5532.8KB/s, iops=1383, runt=117035msec
slat (usec): min=11, max=280986, avg=36642.53, stdev=29021.74
clat (msec): min=8, max=357, avg=49.92, stdev=34.72
lat (msec): min=9, max=450, avg=86.56, stdev=49.23
clat percentiles (msec):
| 1.00th=[ 18], 5.00th=[ 21], 10.00th=[ 23], 20.00th=[ 26],
| 30.00th=[ 30], 40.00th=[ 35], 50.00th=[ 40], 60.00th=[ 46],
| 70.00th=[ 55], 80.00th=[ 67], 90.00th=[ 87], 95.00th=[ 116],
| 99.00th=[ 196], 99.50th=[ 212], 99.90th=[ 245], 99.95th=[ 262],
| 99.99th=[ 289]
bw (KB /s): min= 1656, max= 8416, per=100.00%, avg=5538.84, stdev=1693.65
lat (msec) : 10=0.01%, 20=4.68%, 50=60.37%, 100=28.09%, 250=6.78%
lat (msec) : 500=0.07%
cpu : usr=0.75%, sys=1.22%, ctx=131956, majf=0, minf=18
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=161880/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=120
4kWrite: (groupid=0, jobs=1): err= 0: pid=1840: Thu Oct 6 11:21:58 2016
write: io=989.85MB, bw=8662.4KB/s, iops=2165, runt=117011msec
slat (usec): min=10, max=280928, avg=27429.50, stdev=30133.64
clat (msec): min=3, max=298, avg=27.76, stdev=22.28
lat (msec): min=5, max=406, avg=55.19, stdev=41.84
clat percentiles (msec):
| 1.00th=[ 12], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16],
| 30.00th=[ 18], 40.00th=[ 19], 50.00th=[ 21], 60.00th=[ 23],
| 70.00th=[ 28], 80.00th=[ 35], 90.00th=[ 48], 95.00th=[ 61],
| 99.00th=[ 151], 99.50th=[ 167], 99.90th=[ 190], 99.95th=[ 196],
| 99.99th=[ 233]
bw (KB /s): min= 1880, max=15512, per=100.00%, avg=8670.58, stdev=3524.25
lat (msec) : 4=0.01%, 10=0.22%, 20=47.66%, 50=43.55%, 100=6.53%
lat (msec) : 250=2.04%, 500=0.01%
cpu : usr=1.19%, sys=1.46%, ctx=148656, majf=0, minf=15
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=253396/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=120
Run status group 0 (all jobs):
READ: io=647520KB, aggrb=5532KB/s, minb=5532KB/s, maxb=5532KB/s, mint=117035msec, maxt=117035msec
WRITE: io=989.85MB, aggrb=8662KB/s, minb=8662KB/s, maxb=8662KB/s, mint=117011msec, maxt=117011msec
iostat -d -x sdb sdc sdd sde 1
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 446.53 290.10 1817.82 32376.24 92.84 7.59 10.33 16.29 1.17 1.35 99.21
sdb 0.00 0.00 385.15 289.11 1544.55 32376.24 100.62 7.64 11.24 18.70 1.29 1.47 99.21
sdd 0.00 0.00 428.71 291.09 1718.81 32530.69 95.16 7.41 10.49 16.72 1.31 1.37 98.81
sde 0.00 0.00 434.65 288.12 1742.57 32530.69 94.84 7.53 10.53 16.73 1.16 1.37 99.21
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 347.00 42.00 1408.00 464.00 9.62 7.17 18.34 19.49 8.81 2.56 99.70
sdb 0.00 0.00 321.00 45.00 1300.00 464.00 9.64 7.16 19.50 20.99 8.89 2.70 99.00
sdd 0.00 0.00 315.00 55.00 1260.00 844.00 11.37 8.77 23.78 26.22 9.78 2.70 100.00
sde 0.00 0.00 333.00 54.00 1344.00 844.00 11.31 8.52 22.15 24.22 9.37 2.58 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 315.00 298.00 1272.00 34196.00 115.72 7.48 12.25 22.08 1.86 1.63 100.00
sdb 0.00 0.00 294.00 297.00 1184.00 34196.00 119.73 7.67 13.01 24.22 1.91 1.69 100.00
sdd 0.00 0.00 324.00 277.00 1300.00 32036.00 110.94 7.90 13.06 22.55 1.95 1.66 100.00
sde 0.00 0.00 304.00 291.00 1236.00 32036.00 111.84 7.95 13.25 23.99 2.04 1.68 99.80
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 400.00 275.25 1619.80 31853.47 99.14 7.48 9.95 16.20 0.87 1.48 99.90
sdb 0.00 0.00 362.38 265.35 1461.39 31853.47 106.15 7.56 11.11 18.41 1.14 1.59 100.00
sdd 0.00 0.00 381.19 289.11 1536.63 33695.05 105.12 7.81 10.24 17.20 1.07 1.49 100.00
sde 0.00 0.00 372.28 290.10 1504.95 33695.05 106.28 8.08 10.97 18.67 1.10 1.51 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 430.00 52.00 1728.00 3624.00 22.21 8.20 18.79 20.35 5.85 2.06 99.10
sdb 0.00 0.00 435.00 56.00 1748.00 3628.00 21.90 8.26 18.24 19.86 5.68 2.04 100.00
sdd 0.00 0.00 413.00 30.00 1660.00 2852.00 20.37 7.23 18.27 18.95 8.93 2.26 99.90
sde 0.00 0.00 463.00 36.00 1876.00 3612.00 22.00 7.00 15.55 16.24 6.61 2.00 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 329.00 303.00 1320.00 29480.00 97.47 7.38 11.45 20.36 1.77 1.59 100.50
sdb 0.00 0.00 330.00 288.00 1320.00 29476.00 99.66 7.34 11.72 20.45 1.72 1.63 100.70
sdd 0.00 0.00 328.00 294.00 1324.00 30372.00 101.92 8.41 13.57 24.05 1.88 1.62 100.70
sde 0.00 0.00 338.00 291.00 1356.00 29612.00 98.47 8.48 13.53 23.56 1.87 1.60 100.70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 375.00 353.00 1508.00 33036.00 94.90 6.75 8.30 15.38 0.77 1.37 99.80
sdb 0.00 0.00 354.00 353.00 1420.00 33036.00 97.47 6.69 8.61 16.33 0.87 1.41 100.00
sdd 0.00 0.00 393.00 362.00 1580.00 33204.00 92.14 8.02 8.83 16.24 0.78 1.32 100.00
sde 0.00 0.00 378.00 364.00 1512.00 33204.00 93.57 8.21 9.89 18.60 0.85 1.35 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 436.00 220.00 1744.00 25084.00 81.79 6.76 11.46 16.25 1.98 1.53 100.10
sdb 0.00 0.00 388.00 197.00 1556.00 22536.00 82.37 6.99 12.93 18.29 2.38 1.71 100.10
sdd 0.00 0.00 399.00 200.00 1604.00 23884.00 85.10 8.21 16.21 22.99 2.67 1.67 100.10
sde 0.00 0.00 426.00 206.00 1708.00 23724.00 80.48 7.99 13.84 19.40 2.32 1.58 99.70
As you see - noting unusual, all operations spread equally through physical disks. Read bw=5532.8KB/s Write bw=8662.4KB/s
results for ZoL 0.7.0-rc1
4kRead: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
4kWrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
fio-2.2.8
Starting 2 processes
Jobs: 1 (f=1): [r(1),E(1)] [16.2% done] [3164KB/6600KB/0KB /s] [791/1650/0 iops] [eta 26m:00s]
4kRead: (groupid=0, jobs=1): err= 0: pid=1355: Thu Oct 6 11:46:16 2016
read : io=1655.5MB, bw=5631.3KB/s, iops=1407, runt=301036msec
slat (usec): min=19, max=1674.9K, avg=84165.80, stdev=86037.04
clat (usec): min=0, max=793, avg= 4.30, stdev= 5.06
lat (usec): min=22, max=1674.9K, avg=84173.74, stdev=86037.11
clat percentiles (usec):
| 1.00th=[ 1], 5.00th=[ 2], 10.00th=[ 2], 20.00th=[ 2],
| 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 4], 60.00th=[ 4],
| 70.00th=[ 4], 80.00th=[ 5], 90.00th=[ 6], 95.00th=[ 8],
| 99.00th=[ 25], 99.50th=[ 33], 99.90th=[ 59], 99.95th=[ 74],
| 99.99th=[ 121]
bw (KB /s): min= 285, max= 8104, per=100.00%, avg=5661.88, stdev=1379.09
lat (usec) : 2=2.63%, 4=46.80%, 10=47.02%, 20=2.01%, 50=1.36%
lat (usec) : 100=0.15%, 250=0.02%, 500=0.01%, 750=0.01%, 1000=0.01%
cpu : usr=0.72%, sys=1.07%, ctx=273123, majf=0, minf=18
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=423801/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=120
4kWrite: (groupid=0, jobs=1): err= 0: pid=1357: Thu Oct 6 11:46:16 2016
write: io=4854.9MB, bw=16569KB/s, iops=4142, runt=300035msec
slat (usec): min=37, max=703340, avg=27340.73, stdev=43195.54
clat (usec): min=0, max=24786, avg= 3.49, stdev=24.80
lat (usec): min=41, max=703348, avg=27346.97, stdev=43196.28
clat percentiles (usec):
| 1.00th=[ 1], 5.00th=[ 1], 10.00th=[ 1], 20.00th=[ 2],
| 30.00th=[ 2], 40.00th=[ 2], 50.00th=[ 3], 60.00th=[ 3],
| 70.00th=[ 3], 80.00th=[ 4], 90.00th=[ 5], 95.00th=[ 6],
| 99.00th=[ 23], 99.50th=[ 32], 99.90th=[ 66], 99.95th=[ 87],
| 99.99th=[ 187]
bw (KB /s): min= 894, max=51440, per=100.00%, avg=16744.90, stdev=10056.05
lat (usec) : 2=15.17%, 4=55.15%, 10=27.07%, 20=1.35%, 50=1.06%
lat (usec) : 100=0.16%, 250=0.03%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 50=0.01%
cpu : usr=1.75%, sys=2.21%, ctx=322365, majf=0, minf=15
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1242830/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=120
Run status group 0 (all jobs):
READ: io=1655.5MB, aggrb=5631KB/s, minb=5631KB/s, maxb=5631KB/s, mint=301036msec, maxt=301036msec
WRITE: io=4854.9MB, aggrb=16569KB/s, minb=16569KB/s, maxb=16569KB/s, mint=300035msec, maxt=300035msec
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 381.00 2297.00 1524.00 30128.00 23.64 11.88 4.50 25.34 1.05 0.37 99.70
sdd 0.00 0.00 389.00 2324.00 1556.00 30128.00 23.36 11.69 4.37 24.90 0.94 0.37 99.70
sde 0.00 0.00 340.00 2556.00 1360.00 41800.00 29.81 14.14 4.99 28.86 1.81 0.34 99.70
sdf 0.00 0.00 349.00 2765.00 1396.00 42204.00 28.00 13.47 4.40 28.80 1.32 0.32 99.70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 282.00 2459.00 1128.00 29792.00 22.56 9.19 3.24 21.80 1.12 0.36 99.80
sdd 0.00 0.00 329.00 2599.00 1316.00 29792.00 21.25 9.20 3.15 21.04 0.88 0.34 99.90
sde 0.00 0.00 309.00 2802.00 1236.00 44880.00 29.65 13.70 4.55 33.13 1.40 0.32 100.00
sdf 0.00 0.00 356.00 3008.00 1424.00 43220.00 26.54 13.19 4.02 28.50 1.12 0.30 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 201.00 1331.00 804.00 19816.00 26.92 11.31 7.49 47.41 1.46 0.65 100.20
sdd 0.00 0.00 300.00 1683.00 1200.00 19936.00 21.32 10.76 5.44 30.94 0.90 0.51 100.20
sde 0.00 0.00 248.00 1340.00 992.00 14812.00 19.90 11.39 7.31 39.80 1.30 0.63 100.20
sdf 0.00 0.00 306.00 1474.00 1224.00 14812.00 18.02 11.01 6.20 31.96 0.85 0.56 100.20
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 358.00 3631.00 1432.00 43904.00 22.73 13.70 3.43 27.31 1.07 0.25 100.00
sdd 0.00 0.00 422.00 4237.00 1688.00 43824.00 19.54 12.82 2.73 22.91 0.72 0.21 100.00
sde 0.00 0.00 398.00 2786.00 1592.00 31960.00 21.08 12.68 3.95 24.08 1.08 0.31 100.00
sdf 0.00 0.00 414.00 3009.00 1676.00 31960.00 19.65 12.47 3.63 23.32 0.92 0.29 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 337.62 3292.08 1350.50 44273.27 25.14 13.17 3.31 25.54 1.03 0.27 99.11
sdd 0.00 0.00 361.39 2731.68 1445.54 44225.74 29.53 13.66 4.35 26.38 1.44 0.32 99.11
sde 0.00 0.00 401.98 2526.73 1607.92 30388.12 21.85 11.78 4.09 23.71 0.97 0.34 99.11
sdf 0.00 0.00 393.07 2653.47 1572.28 30522.77 21.07 11.50 3.83 23.66 0.89 0.32 98.61
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 2.00 4767.00 8.00 48292.00 20.26 12.03 0.90 1128.00 0.43 0.21 100.00
sdd 0.00 0.00 222.00 3698.00 916.00 48300.00 25.11 7.38 2.00 20.99 0.86 0.25 97.00
sde 0.00 0.00 144.00 2679.00 576.00 31548.00 22.76 4.24 1.51 12.51 0.92 0.32 91.60
sdf 0.00 0.00 138.00 2603.00 552.00 31412.00 23.32 4.53 1.66 12.97 1.06 0.34 92.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 329.00 2613.00 1316.00 38232.00 26.89 13.77 7.74 57.40 1.49 0.34 100.00
sdd 0.00 0.00 345.00 2867.00 1380.00 38232.00 24.67 12.73 3.92 26.82 1.17 0.31 99.90
sde 0.00 0.00 307.00 2754.00 1228.00 29216.00 19.89 8.60 2.49 16.86 0.89 0.32 99.40
sdf 0.00 0.00 322.00 2584.00 1288.00 29216.00 20.99 8.23 2.75 17.79 0.88 0.34 98.60
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 254.00 1221.00 1016.00 16108.00 23.22 11.55 7.83 38.72 1.41 0.68 100.00
sdd 0.00 0.00 288.00 1222.00 1180.00 16112.00 22.90 11.16 7.34 33.71 1.12 0.66 100.00
sde 0.00 0.00 276.00 1410.00 1104.00 19020.00 23.87 11.39 7.19 37.39 1.27 0.59 100.00
sdf 0.00 0.00 288.00 1435.00 1152.00 19020.00 23.41 11.30 6.59 33.35 1.22 0.58 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 407.00 2401.00 1628.00 28532.00 21.48 12.08 4.30 23.37 1.06 0.36 99.90
sdd 0.00 0.00 469.00 2539.00 1908.00 28520.00 20.23 11.43 3.84 20.98 0.67 0.33 100.10
sde 0.00 0.00 390.00 2741.00 1560.00 37176.00 24.74 12.68 4.03 24.52 1.11 0.32 100.10
sdf 0.00 0.00 422.00 3046.00 1688.00 37460.00 22.58 12.57 3.63 23.08 0.94 0.29 100.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 349.00 2580.00 1396.00 31468.00 22.44 11.44 3.90 23.46 1.25 0.34 100.00
sdd 0.00 0.00 372.00 2557.00 1488.00 31476.00 22.51 11.68 4.01 23.26 1.21 0.34 100.00
sde 0.00 0.00 330.00 2895.00 1320.00 42860.00 27.40 14.33 4.47 29.93 1.57 0.31 100.00
sdf 0.00 0.00 342.00 3170.00 1368.00 42584.00 25.03 14.13 3.96 28.11 1.35 0.28 100.00
Again, noting unusual, all operations spread equally through physical disks. Results even better than 0.6.4.2, especially writes: Read bw=5631.3KB/s Write bw=16569KB/s
Now, turning primarycache=all
zfs set primarycache=all tank/zvol4k-fiotest
results for ZoL 0.6.4.2
4kRead: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
4kWrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
fio-2.2.8
Starting 2 processes
Jobs: 2 (f=2): [r(1),w(1)] [100.0% done] [6636KB/10668KB/0KB /s] [1659/2667/0 iops] [eta 00m:00s]
4kRead: (groupid=0, jobs=1): err= 0: pid=2841: Thu Oct 6 12:11:50 2016
read : io=1655.2MB, bw=5647.5KB/s, iops=1411, runt=300086msec
slat (usec): min=11, max=261559, avg=35423.51, stdev=28289.59
clat (msec): min=3, max=369, avg=49.35, stdev=34.43
lat (msec): min=6, max=474, avg=84.78, stdev=47.62
clat percentiles (msec):
| 1.00th=[ 15], 5.00th=[ 19], 10.00th=[ 22], 20.00th=[ 26],
| 30.00th=[ 30], 40.00th=[ 34], 50.00th=[ 39], 60.00th=[ 46],
| 70.00th=[ 55], 80.00th=[ 67], 90.00th=[ 87], 95.00th=[ 116],
| 99.00th=[ 194], 99.50th=[ 210], 99.90th=[ 243], 99.95th=[ 255],
| 99.99th=[ 285]
bw (KB /s): min= 1776, max= 8624, per=100.00%, avg=5654.94, stdev=1603.54
lat (msec) : 4=0.01%, 10=0.04%, 20=6.90%, 50=58.77%, 100=27.38%
lat (msec) : 250=6.84%, 500=0.07%
cpu : usr=0.79%, sys=1.23%, ctx=326363, majf=0, minf=18
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=423683/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=120
4kWrite: (groupid=0, jobs=1): err= 0: pid=2842: Thu Oct 6 12:11:50 2016
write: io=2557.3MB, bw=8727.3KB/s, iops=2181, runt=300044msec
slat (usec): min=10, max=258294, avg=27412.46, stdev=29073.49
clat (msec): min=3, max=298, avg=27.33, stdev=20.74
lat (msec): min=5, max=395, avg=54.74, stdev=39.04
clat percentiles (msec):
| 1.00th=[ 12], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 17],
| 30.00th=[ 19], 40.00th=[ 20], 50.00th=[ 21], 60.00th=[ 24],
| 70.00th=[ 27], 80.00th=[ 34], 90.00th=[ 45], 95.00th=[ 57],
| 99.00th=[ 145], 99.50th=[ 161], 99.90th=[ 182], 99.95th=[ 190],
| 99.99th=[ 221]
bw (KB /s): min= 1972, max=14797, per=100.00%, avg=8742.75, stdev=3149.20
lat (msec) : 4=0.01%, 10=0.25%, 20=43.81%, 50=48.77%, 100=5.24%
lat (msec) : 250=1.93%, 500=0.01%
cpu : usr=1.24%, sys=1.50%, ctx=369485, majf=0, minf=15
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=654643/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=120
Run status group 0 (all jobs):
READ: io=1655.2MB, aggrb=5647KB/s, minb=5647KB/s, maxb=5647KB/s, mint=300086msec, maxt=300086msec
WRITE: io=2557.3MB, aggrb=8727KB/s, minb=8727KB/s, maxb=8727KB/s, mint=300044msec, maxt=300044msec
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 404.00 42.00 1616.00 4264.00 26.37 8.07 18.68 20.00 5.93 2.24 100.00
sdb 0.00 0.00 348.00 43.00 1392.00 4448.00 29.87 8.45 22.10 23.85 7.93 2.56 100.00
sdd 0.00 0.00 366.00 41.00 1464.00 4148.00 27.58 7.60 19.09 20.39 7.46 2.46 100.00
sde 0.00 0.00 366.00 44.00 1464.00 4532.00 29.25 7.62 18.99 20.42 7.16 2.44 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 266.00 248.00 1064.00 28776.00 116.11 6.78 13.03 23.36 1.94 1.93 99.00
sdb 0.00 0.00 225.00 246.00 900.00 28592.00 125.23 6.74 14.31 27.56 2.19 2.09 98.60
sdd 0.00 0.00 277.00 258.00 1108.00 29200.00 113.30 8.34 15.76 28.27 2.33 1.87 99.80
sde 0.00 0.00 278.00 254.00 1112.00 28816.00 112.51 8.48 16.05 28.43 2.51 1.88 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 397.00 456.00 1588.00 33380.00 81.99 8.15 9.16 18.84 0.74 1.17 100.00
sdb 0.00 0.00 362.00 415.00 1448.00 33380.00 89.65 8.23 10.13 20.74 0.88 1.29 100.00
sdd 0.00 0.00 344.00 435.00 1376.00 32892.00 87.98 7.14 8.62 18.45 0.84 1.28 99.50
sde 0.00 0.00 368.00 418.00 1472.00 32892.00 87.44 7.08 8.50 17.16 0.87 1.27 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 330.00 123.00 1320.00 12364.00 60.42 8.10 18.59 24.18 3.61 2.21 100.30
sdb 0.00 0.00 316.00 152.00 1264.00 12956.00 60.77 8.06 17.93 25.25 2.70 2.14 100.30
sdd 0.00 0.00 329.00 127.00 1316.00 11044.00 54.21 7.51 17.00 22.41 2.97 2.20 100.30
sde 0.00 0.00 332.00 130.00 1328.00 11568.00 55.83 7.46 16.90 22.44 2.73 2.17 100.30
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 327.00 315.00 1308.00 21072.00 69.72 7.87 12.40 22.86 1.53 1.56 100.00
sdb 0.00 0.00 299.00 274.00 1196.00 20480.00 75.66 7.67 13.53 24.23 1.86 1.75 100.00
sdd 0.00 0.00 317.00 261.00 1268.00 21888.00 80.12 8.35 14.67 25.04 2.07 1.73 100.00
sde 0.00 0.00 342.00 289.00 1368.00 21364.00 72.05 8.27 13.18 22.84 1.74 1.58 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 317.00 312.00 1268.00 33128.00 109.37 8.10 12.79 23.76 1.63 1.59 100.00
sdb 0.00 0.00 283.00 303.00 1132.00 33128.00 116.93 8.26 13.71 26.49 1.77 1.71 100.00
sdd 0.00 0.00 300.00 292.00 1200.00 33144.00 116.03 7.28 12.39 22.91 1.59 1.69 99.90
sde 0.00 0.00 295.00 336.00 1180.00 33144.00 108.79 7.52 11.49 23.51 0.93 1.58 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 380.00 171.00 1520.00 20688.00 80.61 8.42 15.06 21.10 1.65 1.81 100.00
sdb 0.00 0.00 392.00 185.00 1568.00 20872.00 77.78 8.28 14.09 20.03 1.50 1.73 100.00
sdd 0.00 0.00 342.00 218.00 1368.00 19544.00 74.69 6.90 12.37 19.44 1.27 1.79 100.00
sde 0.00 0.00 367.00 229.00 1468.00 19416.00 70.08 6.82 11.94 18.02 2.20 1.68 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 354.00 180.00 1416.00 13560.00 56.09 7.27 14.03 20.22 1.87 1.87 100.00
sdb 0.00 0.00 318.00 158.00 1272.00 13376.00 61.55 7.50 16.54 23.48 2.59 2.10 100.00
sdd 0.00 0.00 314.00 144.00 1256.00 12536.00 60.23 8.13 17.37 23.97 2.99 2.18 100.00
sde 0.00 0.00 315.00 149.00 1260.00 12664.00 60.02 8.36 17.85 24.96 2.83 2.15 99.90
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 273.00 281.00 1092.00 32520.00 121.34 7.30 12.87 24.15 1.90 1.80 99.80
sdb 0.00 0.00 291.00 281.00 1164.00 32520.00 117.78 7.61 13.24 24.09 2.02 1.75 100.00
sdd 0.00 0.00 291.00 291.00 1164.00 33828.00 120.25 8.35 14.53 26.92 2.13 1.72 100.00
sde 0.00 0.00 293.00 292.00 1172.00 33828.00 119.66 8.21 14.19 26.42 1.92 1.71 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 358.00 292.00 1432.00 33704.00 108.11 7.08 10.98 19.10 1.02 1.54 100.00
sdb 0.00 0.00 371.00 293.00 1484.00 33704.00 105.99 6.86 10.24 17.52 1.01 1.50 99.60
sdd 0.00 0.00 379.00 274.00 1516.00 32596.00 104.48 8.06 12.16 20.16 1.08 1.53 100.10
sde 0.00 0.00 379.00 274.00 1516.00 32596.00 104.48 8.19 12.26 20.34 1.08 1.53 100.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 344.00 39.00 1376.00 436.00 9.46 9.13 23.49 24.91 10.97 2.61 100.10
sdb 0.00 0.00 313.00 38.00 1252.00 436.00 9.62 9.36 26.72 28.38 13.03 2.85 100.00
sdd 0.00 0.00 313.00 31.00 1252.00 260.00 8.79 6.89 20.43 21.17 13.00 2.91 100.00
sde 0.00 0.00 317.00 29.00 1268.00 260.00 8.83 6.90 20.35 20.96 13.62 2.89 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 306.00 351.00 1224.00 34360.00 108.32 8.79 13.58 27.14 1.77 1.52 100.00
sdb 0.00 0.00 274.00 337.00 1096.00 34360.00 116.06 8.84 14.49 29.98 1.90 1.64 100.00
sdd 0.00 0.00 255.00 334.00 1020.00 31988.00 112.08 5.87 10.04 21.20 1.53 1.69 99.80
sde 0.00 0.00 265.00 340.00 1060.00 31988.00 109.25 5.83 9.66 20.07 1.55 1.64 99.20
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 380.00 338.00 1520.00 33440.00 97.38 7.31 9.63 17.41 0.88 1.39 99.90
sdb 0.00 0.00 359.00 346.00 1436.00 33440.00 98.94 7.34 9.74 18.30 0.86 1.41 99.70
sdd 0.00 0.00 392.00 324.00 1568.00 32976.00 96.49 7.82 10.36 18.11 0.98 1.39 99.20
sde 0.00 0.00 371.00 338.00 1484.00 32976.00 97.21 8.09 10.73 19.66 0.94 1.41 99.70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 343.00 79.00 1372.00 9180.00 50.01 6.92 17.54 20.69 3.89 2.33 98.50
sdb 0.00 0.00 336.00 80.00 1344.00 9100.00 50.21 7.08 18.12 21.46 4.09 2.39 99.40
sdd 0.00 0.00 323.00 83.00 1292.00 9324.00 52.30 8.09 20.86 24.92 5.07 2.45 99.50
sde 0.00 0.00 366.00 87.00 1464.00 9196.00 47.06 7.90 18.43 21.76 4.40 2.21 99.90
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 328.00 218.00 1312.00 24720.00 95.36 8.62 15.75 24.80 2.14 1.83 100.00
sdb 0.00 0.00 296.00 218.00 1184.00 24800.00 101.11 8.71 17.09 27.99 2.29 1.95 100.00
sdd 0.00 0.00 303.00 219.00 1212.00 23164.00 93.39 6.56 12.57 20.45 1.66 1.92 100.00
sde 0.00 0.00 303.00 219.00 1212.00 23292.00 93.89 6.65 12.68 20.32 2.11 1.92 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 294.00 290.00 1176.00 33272.00 117.97 7.52 12.55 23.29 1.67 1.71 100.00
sdb 0.00 0.00 270.00 290.00 1080.00 33272.00 122.69 7.52 12.65 24.33 1.77 1.78 99.90
sdd 0.00 0.00 278.00 285.00 1112.00 33148.00 121.71 7.80 13.75 26.01 1.78 1.77 99.40
sde 0.00 0.00 316.00 284.00 1264.00 33148.00 114.71 7.90 13.10 23.36 1.67 1.66 99.70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 427.00 219.00 1708.00 24184.00 80.16 7.71 12.19 17.74 1.36 1.55 100.00
sdb 0.00 0.00 378.00 184.00 1512.00 22152.00 84.21 7.84 14.77 21.10 1.77 1.77 99.70
sdd 0.00 0.00 412.00 185.00 1648.00 22308.00 80.25 7.27 12.06 16.95 1.15 1.68 100.00
sde 0.00 0.00 406.00 187.00 1624.00 22496.00 81.35 7.15 12.29 17.27 1.47 1.69 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 323.00 99.00 1292.00 10092.00 53.95 8.78 20.74 25.86 4.04 2.37 100.00
sdb 0.00 0.00 331.00 116.00 1324.00 12124.00 60.17 8.69 19.34 24.87 3.55 2.24 100.00
sdd 0.00 0.00 325.00 109.00 1300.00 9776.00 51.04 7.09 16.68 21.25 3.05 2.30 99.90
sde 0.00 0.00 337.00 107.00 1348.00 9588.00 49.26 6.98 15.66 19.58 3.29 2.25 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 279.00 384.00 1116.00 33700.00 105.03 7.01 10.71 23.47 1.44 1.51 99.90
sdb 0.00 0.00 261.00 365.00 1044.00 33700.00 111.00 6.84 11.06 24.38 1.53 1.59 99.80
sdd 0.00 0.00 270.00 400.00 1080.00 32668.00 100.74 8.17 11.76 26.83 1.59 1.49 100.00
sde 0.00 0.00 300.00 410.00 1200.00 32668.00 95.40 8.36 11.61 25.44 1.50 1.41 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 366.00 284.00 1464.00 32820.00 105.49 6.87 10.51 17.85 1.06 1.53 99.20
sdb 0.00 0.00 358.00 284.00 1432.00 32820.00 106.70 6.92 10.69 18.37 1.02 1.56 99.90
sdd 0.00 0.00 384.00 317.00 1536.00 34520.00 102.87 8.13 11.78 20.66 1.04 1.43 100.00
sde 0.00 0.00 390.00 315.00 1560.00 34520.00 102.35 8.12 11.57 20.13 0.98 1.42 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 361.00 11.00 1444.00 304.00 9.40 8.01 21.62 21.63 21.18 2.69 100.00
sdb 0.00 0.00 336.00 12.00 1344.00 432.00 10.21 8.18 23.37 23.33 24.42 2.87 100.00
sdd 0.00 0.00 370.00 12.00 1480.00 436.00 10.03 7.67 20.48 20.39 23.17 2.62 100.00
sde 0.00 0.00 336.00 12.00 1344.00 436.00 10.23 7.81 22.85 22.71 26.83 2.87 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 272.00 330.00 1088.00 34032.00 116.68 8.87 14.68 29.68 2.32 1.66 100.00
sdb 0.00 0.00 240.00 328.00 960.00 33904.00 122.76 8.78 15.51 33.46 2.38 1.76 100.00
sdd 0.00 0.00 260.00 327.00 1040.00 32452.00 114.11 7.69 12.94 26.49 2.16 1.71 100.10
sde 0.00 0.00 242.00 301.00 968.00 32452.00 123.09 7.58 13.76 28.04 2.29 1.84 100.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 345.00 281.00 1380.00 32592.00 108.54 7.85 11.77 20.45 1.12 1.59 99.50
sdb 0.00 0.00 322.00 282.00 1288.00 32592.00 112.19 7.96 12.50 22.53 1.05 1.66 100.00
sdd 0.00 0.00 339.00 282.00 1356.00 33700.00 112.90 7.37 10.94 19.18 1.03 1.61 99.70
sde 0.00 0.00 315.00 283.00 1260.00 33700.00 116.92 7.50 11.70 21.23 1.08 1.67 99.80
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 406.00 94.00 1624.00 10828.00 49.81 7.20 15.51 18.32 3.36 1.99 99.60
sdb 0.00 0.00 364.00 87.00 1456.00 9936.00 50.52 7.36 17.25 20.46 3.82 2.21 99.70
sdd 0.00 0.00 365.00 80.00 1460.00 9080.00 47.37 8.47 20.05 23.52 4.22 2.25 100.00
sde 0.00 0.00 421.00 95.00 1684.00 10984.00 49.10 8.29 17.14 20.21 3.52 1.94 100.00
It's ok, all operations spread equally through physical disks. Results almost same, because ARC<=2Gb, but test volume=10GB Read bw=5647.5KB/s Write bw=8727.3KB/s
drumroll...... results for ZoL 0.7.0-rc1
4kRead: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
4kWrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
fio-2.2.8
Starting 2 processes
Jobs: 1 (f=1): [r(1),_(1)] [50.5% done] [128KB/0KB/0KB /s] [32/0/0 iops] [eta 04m:57s]
4kRead: (groupid=0, jobs=1): err= 0: pid=12908: Thu Oct 6 11:57:21 2016
read : io=604228KB, bw=1995.3KB/s, iops=498, runt=302838msec
slat (usec): min=15, max=7429.4K, avg=238975.31, stdev=607098.19
clat (usec): min=0, max=3169, avg= 5.51, stdev=11.74
lat (usec): min=18, max=7429.5K, avg=238984.52, stdev=607099.07
clat percentiles (usec):
| 1.00th=[ 1], 5.00th=[ 2], 10.00th=[ 2], 20.00th=[ 3],
| 30.00th=[ 3], 40.00th=[ 4], 50.00th=[ 4], 60.00th=[ 5],
| 70.00th=[ 6], 80.00th=[ 7], 90.00th=[ 9], 95.00th=[ 11],
| 99.00th=[ 28], 99.50th=[ 37], 99.90th=[ 84], 99.95th=[ 131],
| 99.99th=[ 310]
bw (KB /s): min= 9, max= 6416, per=100.00%, avg=2042.75, stdev=1810.27
lat (usec) : 2=1.90%, 4=32.04%, 10=58.42%, 20=5.81%, 50=1.57%
lat (usec) : 100=0.19%, 250=0.05%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 4=0.01%
cpu : usr=0.31%, sys=0.50%, ctx=136428, majf=0, minf=18
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=151057/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=120
4kWrite: (groupid=0, jobs=1): err= 0: pid=12909: Thu Oct 6 11:57:21 2016
write: io=2481.3MB, bw=8466.2KB/s, iops=2116, runt=300113msec
slat (usec): min=40, max=4683.1K, avg=55860.01, stdev=35143.44
clat (usec): min=0, max=6942, avg= 4.62, stdev=16.57
lat (usec): min=42, max=4683.1K, avg=55868.15, stdev=35144.44
clat percentiles (usec):
| 1.00th=[ 1], 5.00th=[ 2], 10.00th=[ 2], 20.00th=[ 2],
| 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 4], 60.00th=[ 4],
| 70.00th=[ 5], 80.00th=[ 5], 90.00th=[ 6], 95.00th=[ 8],
| 99.00th=[ 27], 99.50th=[ 38], 99.90th=[ 71], 99.95th=[ 99],
| 99.99th=[ 326]
bw (KB /s): min= 3497, max=75872, per=100.00%, avg=8474.47, stdev=6054.31
lat (usec) : 2=4.63%, 4=39.85%, 10=52.47%, 20=1.42%, 50=1.37%
lat (usec) : 100=0.21%, 250=0.04%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
cpu : usr=1.34%, sys=1.98%, ctx=434753, majf=0, minf=15
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=635198/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=120
Run status group 0 (all jobs):
READ: io=604228KB, aggrb=1995KB/s, minb=1995KB/s, maxb=1995KB/s, mint=302838msec, maxt=302838msec
WRITE: io=2481.3MB, aggrb=8466KB/s, minb=8466KB/s, maxb=8466KB/s, mint=300113msec, maxt=300113msec
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 59.00 0.00 236.00 0.00 8.00 0.30 5.22 5.22 0.00 4.58 27.00
sdd 0.00 0.00 59.00 0.00 236.00 0.00 8.00 0.28 4.78 4.78 0.00 4.41 26.00
sde 0.00 0.00 56.00 2004.00 224.00 10448.00 10.36 19.65 11.27 242.95 4.80 0.49 100.00
sdf 0.00 0.00 53.00 1988.00 212.00 15896.00 15.78 19.75 10.38 214.91 4.92 0.49 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 127.00 0.00 508.00 0.00 8.00 0.82 6.44 6.44 0.00 4.87 61.80
sdd 0.00 0.00 130.00 0.00 520.00 0.00 8.00 0.77 5.94 5.94 0.00 4.54 59.00
sde 0.00 0.00 101.00 2674.00 416.00 14108.00 10.47 19.31 6.94 97.66 3.51 0.36 100.00
sdf 0.00 0.00 153.00 3482.00 612.00 27856.00 15.66 19.24 5.24 62.90 2.70 0.28 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 111.00 0.00 444.00 0.00 8.00 0.59 5.36 5.36 0.00 4.41 49.00
sdd 0.00 0.00 120.00 0.00 480.00 0.00 8.00 0.69 5.73 5.73 0.00 4.53 54.40
sde 0.00 0.00 94.00 2837.00 376.00 15116.00 10.57 19.48 6.85 112.80 3.34 0.34 100.00
sdf 0.00 0.00 138.00 3204.00 552.00 25632.00 15.67 19.38 5.84 72.33 2.98 0.30 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 244.00 0.00 976.00 0.00 8.00 2.51 10.22 10.22 0.00 3.71 90.50
sdd 0.00 0.00 254.00 0.00 1016.00 0.00 8.00 2.52 9.85 9.85 0.00 3.56 90.50
sde 0.00 0.00 37.00 1542.00 148.00 11876.00 15.23 19.80 10.52 185.76 6.32 0.63 100.10
sdf 0.00 0.00 467.00 335.00 1868.00 2680.00 11.34 11.08 14.47 22.46 3.35 1.25 100.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 290.00 0.00 1160.00 0.00 8.00 3.40 11.74 11.74 0.00 3.40 98.60
sdd 0.00 0.00 285.00 0.00 1140.00 0.00 8.00 3.36 11.84 11.84 0.00 3.47 99.00
sde 0.00 0.00 39.00 1645.00 156.00 13152.00 15.81 19.79 13.16 317.03 5.96 0.59 99.90
sdf 0.00 0.00 548.00 0.00 2192.00 0.00 8.00 9.96 18.00 18.00 0.00 1.82 99.90
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 278.00 0.00 1112.00 0.00 8.00 3.26 11.60 11.60 0.00 3.51 97.50
sdd 0.00 0.00 310.00 0.00 1240.00 0.00 8.00 3.26 10.46 10.46 0.00 3.17 98.20
sde 0.00 0.00 84.00 2978.00 336.00 23824.00 15.78 13.34 5.04 68.63 3.25 0.33 100.00
sdf 0.00 0.00 490.00 0.00 1972.00 0.00 8.05 9.97 20.19 20.19 0.00 2.04 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 315.00 0.00 1260.00 0.00 8.00 3.60 11.43 11.43 0.00 3.14 98.80
sdd 0.00 0.00 308.00 0.00 1232.00 0.00 8.00 3.60 11.49 11.49 0.00 3.20 98.70
sde 0.00 0.00 69.00 3166.00 276.00 25328.00 15.83 10.45 3.22 13.67 2.99 0.31 100.00
sdf 0.00 0.00 550.00 0.00 2200.00 0.00 8.00 9.98 18.45 18.45 0.00 1.82 100.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 296.00 723.00 1184.00 9320.00 20.62 5.39 5.33 15.53 1.16 0.95 96.70
sdd 0.00 0.00 291.00 746.00 1164.00 9312.00 20.20 5.35 5.25 16.29 0.94 0.95 98.10
sde 0.00 0.00 164.00 1318.00 656.00 10604.00 15.20 11.30 6.88 29.84 4.03 0.67 98.70
sdf 0.00 0.00 366.00 504.00 1464.00 3552.00 11.53 11.16 11.81 23.59 3.26 1.13 98.40
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 7.00 3891.00 28.00 53396.00 27.41 6.79 1.73 11.43 1.71 0.23 88.90
sdd 0.00 0.00 12.00 3756.00 48.00 52484.00 27.88 7.10 1.88 12.58 1.85 0.24 91.60
sde 0.00 0.00 5.00 883.00 20.00 7064.00 15.95 19.40 15.53 893.60 10.56 1.13 100.00
sdf 0.00 0.00 12.00 1344.00 48.00 10128.00 15.01 19.28 10.05 362.83 6.90 0.74 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 7.00 4529.00 28.00 48168.00 21.25 8.09 1.80 12.71 1.78 0.21 97.30
sdd 0.00 0.00 11.00 4464.00 44.00 46976.00 21.01 8.02 1.78 9.09 1.76 0.21 96.00
sde 0.00 0.00 9.00 871.00 36.00 6968.00 15.92 19.57 25.00 1371.44 11.09 1.14 100.00
sdf 0.00 0.00 10.00 1302.00 40.00 10416.00 15.94 19.20 13.83 892.50 7.08 0.76 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 8.00 4445.00 32.00 48828.00 21.94 8.20 1.83 10.88 1.81 0.22 97.20
sdd 0.00 0.00 5.00 4807.00 20.00 55040.00 22.88 7.83 1.63 9.60 1.63 0.20 97.20
sde 0.00 0.00 3.00 981.00 12.00 7848.00 15.98 19.60 13.96 1421.33 9.65 1.02 100.10
sdf 0.00 0.00 4.00 1228.00 16.00 9824.00 15.97 19.49 13.47 1794.75 7.67 0.81 100.10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 9.00 1718.00 36.00 46556.00 53.96 4.96 2.90 4.67 2.89 0.35 60.50
sdd 0.00 0.00 11.00 1882.00 44.00 42456.00 44.90 4.21 2.24 8.82 2.20 0.32 59.70
sde 0.00 0.00 9.00 737.00 36.00 5896.00 15.90 19.80 35.32 1823.56 13.48 1.34 100.00
sdf 0.00 0.00 11.00 1112.00 44.00 8896.00 15.92 19.71 20.53 1210.18 8.76 0.89 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 1.00 0.00 4.00 0.00 8.00 0.00 4.00 4.00 0.00 4.00 0.40
sdd 0.00 0.00 5.00 0.00 20.00 0.00 8.00 0.02 3.60 3.60 0.00 3.60 1.80
sde 0.00 0.00 4.00 778.00 16.00 6224.00 15.96 19.94 15.94 652.50 12.67 1.28 100.00
sdf 0.00 0.00 6.00 1190.00 24.00 9520.00 15.96 19.94 15.51 1432.00 8.37 0.84 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 6.00 0.00 24.00 0.00 8.00 0.03 4.83 4.83 0.00 4.83 2.90
sdd 0.00 0.00 14.00 0.00 56.00 0.00 8.00 0.06 4.07 4.07 0.00 4.07 5.70
sde 0.00 0.00 7.00 728.00 28.00 5824.00 15.92 19.96 30.30 1753.57 13.73 1.36 100.00
sdf 0.00 0.00 8.00 1430.00 32.00 11440.00 15.96 19.92 13.74 1230.88 6.94 0.70 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 22.00 0.00 88.00 0.00 8.00 0.09 4.09 4.09 0.00 4.23 9.30
sdd 0.00 0.00 14.00 0.00 56.00 0.00 8.00 0.06 4.07 4.07 0.00 4.07 5.70
sde 0.00 0.00 7.00 860.00 28.00 6880.00 15.94 19.94 21.84 1275.14 11.64 1.15 100.00
sdf 0.00 0.00 33.00 1860.00 132.00 10684.00 11.43 19.87 13.87 497.27 5.29 0.53 100.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.00 0.00 20.00 0.00 80.00 0.00 8.00 0.10 4.90 4.90 0.00 4.50 9.00
sdd 0.00 0.00 25.00 0.00 100.00 0.00 8.00 0.12 4.92 4.92 0.00 4.88 12.20
sde 0.00 0.00 9.00 931.00 36.00 7448.00 15.92 19.90 22.05 1220.56 10.47 1.06 100.00
sdf 0.00 0.00 45.00 2731.00 180.00 13516.00 9.87 19.78 7.43 239.67 3.60 0.36 100.00
Yes, that's it - one pair of disks are overloaded, and we see slightly different avgrq-sz. So, results twice worse, than without data caching: Read bw=1995.3KB/s Write bw=8466.2KB/s
@redtex Thanks. I'll get this set up on my test rig today.
@redtex Does this issue happen if the same test is run on the VM host?
@redtex To clarify further, you are running zfs in the guest, correct? I don't have a CentOS guest handy at the moment so will be running this in a Ubuntu 14.04 guest with a 3.19 kernel initially.
Yes, I'm running this tests on CentOS 7 VM. But absolutely same behaviour presents on pure hardware. I'll check it on Fedora 24 with 4.7 series kernel
@redtex I just ran my first 2 tests: one with primarycache=metadata and the other with primarycache=all and didn't see much difference.
primarycache=metadata
4kRead: (groupid=0, jobs=1): err= 0: pid=19699: Thu Oct 6 11:53:10 2016
read : io=803612KB, bw=2674.6KB/s, iops=668, runt=300464msec
4kWrite: (groupid=0, jobs=1): err= 0: pid=19700: Thu Oct 6 11:53:10 2016
write: io=2952.9MB, bw=10035KB/s, iops=2508, runt=301309msec
and
primarycache=all
4kRead: (groupid=0, jobs=1): err= 0: pid=24290: Thu Oct 6 12:04:24 2016
read : io=911856KB, bw=3034.1KB/s, iops=758, runt=300451msec
4kWrite: (groupid=0, jobs=1): err= 0: pid=24291: Thu Oct 6 12:04:24 2016
write: io=2763.6MB, bw=9431.4KB/s, iops=2357, runt=300045msec
The iostat didn't show anything terribly weird. When run with 1 second interval, the numbers were pretty much all over the place. This is with current master so I'll be trying next with an actual 0.7.0-rc1 since that doesn't have the highly re-worked ARC code due to the compressed ARC.
I may just run these tests on the host now if that makes no difference. For your VM guest, however, I was wondering if you used cache=none
for your virtio-scsi disks (I did use it). Also, regarding the raw numbers shown above, the system I'm testing on has pretty ordinary hard drives (but it has a lot of them). I configured the pool exactly as you did for this test.
Well.. my fio test run on 4Gb RAM virtual machine, i.e. ARC has 2Gb. So with this setup ARC fills up for 3 minutes. Until ARC not fully filled - the iostat did not shows anything unusual. Maybe, my disks (yes they has cache=none option) too fast, maybe you has to run tests some longer, than 5min, which is duration of my fio job.
Here is my results from Fedora 24, kernel 4.7.5-200.fc24.x86_64
primarycache=metadata
4kRead: (groupid=0, jobs=1): err= 0: pid=28808: Thu Oct 6 23:46:17 2016
read : io=902792KB, bw=2997.5KB/s, iops=749, runt=301188msec
4kWrite: (groupid=0, jobs=1): err= 0: pid=28809: Thu Oct 6 23:46:17 2016
write: io=6728.4MB, bw=22934KB/s, iops=5733, runt=300418msec
and
4kRead: (groupid=0, jobs=1): err= 0: pid=16902: Fri Oct 7 00:12:14 2016
read : io=787268KB, bw=2605.7KB/s, iops=651, runt=302138msec
4kWrite: (groupid=0, jobs=1): err= 0: pid=16903: Fri Oct 7 00:12:14 2016
write: io=4953.6MB, bw=16903KB/s, iops=4225, runt=300095msec
The uneven IO is present, but more rarely, than with CentOS 7 kernel 3.10 But overall read performance almost twice worse, than with CentOS 7 kernel 3.10
Of course, disks the same - actually, it's the same pool from CentOS tests, connected to Fedora 24 VM.
Hi,
I'm sorry if below info isn't relative, but at first glance the observed situation on FreeBSD looks similar with the current issue. So, one message with found solution of uneven load: https://lists.freebsd.org/pipermail/freebsd-fs/2016-December/024178.html . Please, put a look on whole messaging thread. The fix: https://svnweb.freebsd.org/base?view=revision&revision=309714 As I can see ZoL (zio_timestamp_compare in https://github.com/zfsonlinux/zfs/blob/master/module/zfs/zio.c) could suffer from the same issue which was fixed in FreeBSD. Could somebody more competent in ZoL internals put a look and make "back-port" to ZoL if needed? Let's make ZoL not worse than ZFS on FreeBSD :) Thanks.
@igsol looks like you point to https://www.illumos.org/issues/7090 , which is already ported in https://github.com/zfsonlinux/zfs/commit/3dfb57a to master branch.
EDIT my bad, mixed up, it's the different commit and not ported to ZoL.
@igsol thanks for pointing this out. We should adapt the fix from FreeBSD and see how it impacts performance. However, I don't see how it could be the root cause of this exact issue. The problematic function was only first enable by default in 0.7.0-rc3 and this issue predates that.
The problematic function was only first enable by default in 0.7.0-rc3
Sure, you are right. In any case I am glad that the suspicious comparison fixed in FreeBSD will get attention of right people in ZoL.
Upgraded production system from 0.6.4.2 to 0.7.6 Issue is gone
Hi !!! On a host - Centos 7.1 3.10.0-229.14.1.el7.x86_64, 32G RAM, ZoL 0.6.5.2 - which serves VM images in zvols via iSCSI (SCST), there is very strange situation: After upgrade from 0.6.4 to 0.6.5 I noticed a significant performance drop - which seems like near 100% disks busy in iostat. It looks like:
and zpool iostat is
So, it's clearly seen, that one of mirrors has many times lower size of IO (avgrq-sz) which leads to huge performance drop. And I noticed, that such behavior starts after some significant time of work (several hours). After system reboot. The ARC size is 20G.