openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.66k stars 1.76k forks source link

Highly uneven IO size during write on Zvol #3871

Closed redtex closed 6 years ago

redtex commented 9 years ago

Hi !!! On a host - Centos 7.1 3.10.0-229.14.1.el7.x86_64, 32G RAM, ZoL 0.6.5.2 - which serves VM images in zvols via iSCSI (SCST), there is very strange situation: After upgrade from 0.6.4 to 0.6.5 I noticed a significant performance drop - which seems like near 100% disks busy in iostat. It looks like:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-9              0.00     0.00    2.00  169.00    79.00  6861.00    81.17     0.20    1.19   13.00    1.05   1.17  20.00
dm-10             0.00     0.00    0.00   51.00     0.00   752.00    29.49     0.09    1.78    0.00    1.78   1.76   9.00
dm-11             0.00     0.00    2.00  168.00    11.50  6861.00    80.85     0.21    1.25    6.00    1.20   1.24  21.00
dm-12             0.00     0.00    2.00  221.00    81.00  1421.50    13.48     0.44    1.96   11.50    1.87   1.95  43.50
dm-13             0.00     0.00    1.00  290.00    82.00  1793.50    12.89     0.43    1.47    8.00    1.45   1.47  42.90

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-10             0.00     0.00    0.00   56.00     0.00  2424.00    86.57     0.19    3.34    0.00    3.34   1.64   9.20
dm-11             0.00     0.00    1.00    0.00     6.50     0.00    13.00     0.01    9.00    9.00    0.00   9.00   0.90
dm-12             0.00     0.00    0.00  414.00     0.00  2600.50    12.56     1.00    2.42    0.00    2.42   2.40  99.20
dm-13             0.00     0.00    0.00  483.00     0.00  2432.50    10.07     0.99    2.07    0.00    2.07   2.04  98.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-9              0.00     0.00    1.00    0.00     6.50     0.00    13.00     0.01   11.00   11.00    0.00  11.00   1.10
dm-10             0.00     0.00    0.00   27.00     0.00  1224.00    90.67     0.11    4.22    0.00    4.22   1.85   5.00
dm-11             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-12             0.00     0.00    2.00  495.00     2.50  1551.50     6.25     1.18    2.31   86.00    1.97   2.00  99.50
dm-13             0.00     0.00    0.00  489.00     0.00  2190.00     8.96     1.12    2.01    0.00    2.01   2.03  99.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-9              0.00     0.00    1.00    0.00     7.00     0.00    14.00     0.01    8.00    8.00    0.00   8.00   0.80
dm-10             0.00     0.00    0.00   60.00     0.00   604.00    20.13     0.10    1.72    0.00    1.72   1.68  10.10
dm-11             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-12             0.00     0.00    2.00  366.00    12.50  1519.50     8.33     1.28    3.55  137.00    2.82   2.70  99.30
dm-13             0.00     0.00    0.00  402.00     0.00  1290.50     6.42     1.98    2.49    0.00    2.49   2.49 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-9              0.00     0.00    3.00   49.00    10.00   506.50    19.87     0.11    2.12    3.00    2.06   2.06  10.70
dm-10             0.00     0.00    0.00   83.00     0.00   912.00    21.98     0.16    1.92    0.00    1.92   1.64  13.60
dm-11             0.00     0.00    3.00   51.00    25.50   506.50    19.70     0.12    2.30   13.67    1.63   2.26  12.20
dm-12             0.00     0.00    1.00  366.00     1.00  1622.50     8.85     0.79    2.16   10.00    2.14   2.12  77.90
dm-13             0.00     0.00    1.00  198.00    53.00  1009.00    10.67     0.61    8.69 1226.00    2.55   3.02  60.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-9              0.00     0.00    4.00  159.00    22.50  6461.50    79.56     0.49    3.01   77.75    1.13   2.23  36.30
dm-10             0.00     0.00    0.00  183.00     0.00 10648.00   116.37     1.58    8.63    0.00    8.63   1.72  31.50
dm-11             0.00     0.00    4.00  154.00    15.00  6461.50    81.98     0.24    1.52   13.00    1.22   1.47  23.30
dm-12             0.00     0.00    4.00  284.00   164.00  2629.50    19.40     0.69    2.36   62.50    1.51   1.53  44.00
dm-13             0.00     0.00    2.00  279.00    48.50  2016.00    14.69     0.52    1.62   13.50    1.53   1.57  44.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-9              0.00     0.00    1.00    0.00     1.00     0.00     2.00     0.02   23.00   23.00    0.00  23.00   2.30
dm-10             0.00     0.00    0.00   41.00     0.00   280.00    13.66     0.07    1.83    0.00    1.83   1.80   7.40
dm-11             0.00     0.00    1.00    0.00     1.50     0.00     3.00     0.01    8.00    8.00    0.00   8.00   0.80
dm-12             0.00     0.00    1.00  430.00   128.00  2956.50    14.31     2.11    2.36   14.00    2.33   2.31  99.60
dm-13             0.00     0.00    0.00  473.00     0.00  2972.00    12.57     2.84    2.10    0.00    2.10   2.11 100.00

and zpool iostat is


                          capacity     operations    bandwidth
pool                   alloc   free   read  write   read  write
---------------------  -----  -----  -----  -----  -----  -----
sas                    1.83T  1.79T      5     85  14.5K  4.67M
  mirror                938G   918G      0      0      0      0
    35000cca02827b824      -      -      0      0      0      0
    35000cca02827b8c4      -      -      0      0      0      0
  mirror                938G   918G      5      0  14.5K      0
    35000cca02827bb30      -      -      3      0  12.5K      0
    35000cca02827d228      -      -      1      0  2.00K      0
logs                       -      -      -      -      -      -
  35000c5003330fa5b    50.7M   278G      0     85      0  4.67M
---------------------  -----  -----  -----  -----  -----  -----

                          capacity     operations    bandwidth
pool                   alloc   free   read  write   read  write
---------------------  -----  -----  -----  -----  -----  -----
sas                    1.83T  1.79T      0  1.33K      0  8.23M
  mirror                938G   918G      0    311      0  1.68M
    35000cca02827b824      -      -      0    301      0  1.68M
    35000cca02827b8c4      -      -      0    400      0  2.20M
  mirror                938G   918G      0  1.00K      0  6.26M
    35000cca02827bb30      -      -      0    146      0  6.26M
    35000cca02827d228      -      -      0    146      0  6.26M
logs                       -      -      -      -      -      -
  35000c5003330fa5b    50.7M   278G      0     21      0   296K
---------------------  -----  -----  -----  -----  -----  -----

                          capacity     operations    bandwidth
pool                   alloc   free   read  write   read  write
---------------------  -----  -----  -----  -----  -----  -----
sas                    1.83T  1.79T      0    490  18.0K  2.52M
  mirror                938G   918G      0    481      0  2.35M
    35000cca02827b824      -      -      0    470      0  2.35M
    35000cca02827b8c4      -      -      0    470      0  2.22M
  mirror                938G   918G      0      0  18.0K      0
    35000cca02827bb30      -      -      0      0  18.0K      0
    35000cca02827d228      -      -      0      0      0      0
logs                       -      -      -      -      -      -
  35000c5003330fa5b    50.7M   278G      0      8      0   180K
---------------------  -----  -----  -----  -----  -----  -----

                          capacity     operations    bandwidth
pool                   alloc   free   read  write   read  write
---------------------  -----  -----  -----  -----  -----  -----
sas                    1.83T  1.79T      9    408  61.5K  5.50M
  mirror                938G   918G      6    358  45.0K  1.78M
    35000cca02827b824      -      -      1    354  11.0K  1.78M
    35000cca02827b8c4      -      -      4    346  34.0K  2.14M
  mirror                938G   918G      2      0  16.5K      0
    35000cca02827bb30      -      -      1      0  11.0K      0
    35000cca02827d228      -      -      0      0  5.50K      0
logs                       -      -      -      -      -      -
  35000c5003330fa5b    50.7M   278G      0     49      0  3.72M
---------------------  -----  -----  -----  -----  -----  -----

So, it's clearly seen, that one of mirrors has many times lower size of IO (avgrq-sz) which leads to huge performance drop. And I noticed, that such behavior starts after some significant time of work (several hours). After system reboot. The ARC size is 20G.

kernelOfTruth commented 9 years ago

Would there be a change if you go below 1/2 (half) of your RAM size with ARC ?

What are the other settings of your pools and zvols ? (compression ? noatime ? xattrs ?)

dweeezil commented 9 years ago

@redtex I suspect your device mapper devices appear to be non-rotational which is causing 057b87c to launch [EDIT] all all "PRIORITY_SYNC" IO synchronously. Check out the values of /sys/block/dm-X/queue/rotational. If they're all zero, that's your problem.

It looks like you can poke a 1 into those files (echo 1 > /sys/block/dm-X/queue/rotational) before importing to pool which might fix the problem if so.

redtex commented 9 years ago

Hi !!! Thank you for fast answer !! In general, all zvols has same properties, except volblocksize, which may be 4k, 32k and 128k

# zfs get all sas/vm-301-disk-1
NAME               PROPERTY              VALUE                  SOURCE
sas/vm-301-disk-1  type                  volume                 -
sas/vm-301-disk-1  creation              Fri May  8 18:45 2015  -
sas/vm-301-disk-1  used                  25.4G                  -
sas/vm-301-disk-1  available             1.68T                  -
sas/vm-301-disk-1  referenced            25.4G                  -
sas/vm-301-disk-1  compressratio         1.14x                  -
sas/vm-301-disk-1  reservation           none                   default
sas/vm-301-disk-1  volsize               32.0G                  local
sas/vm-301-disk-1  volblocksize          128K                   -
sas/vm-301-disk-1  checksum              on                     default
sas/vm-301-disk-1  compression           lz4                    inherited from sas
sas/vm-301-disk-1  readonly              off                    default
sas/vm-301-disk-1  copies                1                      default
sas/vm-301-disk-1  refreservation        none                   received
sas/vm-301-disk-1  primarycache          all                    inherited from sas
sas/vm-301-disk-1  secondarycache        all                    default
sas/vm-301-disk-1  usedbysnapshots       29.3M                  -
sas/vm-301-disk-1  usedbydataset         25.4G                  -
sas/vm-301-disk-1  usedbychildren        0                      -
sas/vm-301-disk-1  usedbyrefreservation  0                      -
sas/vm-301-disk-1  logbias               latency                inherited from sas
sas/vm-301-disk-1  dedup                 off                    default
sas/vm-301-disk-1  mlslabel              none                   default
sas/vm-301-disk-1  sync                  standard               inherited from sas
sas/vm-301-disk-1  refcompressratio      1.13x                  -
sas/vm-301-disk-1  written               46.3M                  -
sas/vm-301-disk-1  logicalused           29.1G                  -
sas/vm-301-disk-1  logicalreferenced     28.8G                  -
sas/vm-301-disk-1  snapshot_limit        none                   default
sas/vm-301-disk-1  snapshot_count        none                   default
sas/vm-301-disk-1  snapdev               hidden                 default
sas/vm-301-disk-1  context               none                   default
sas/vm-301-disk-1  fscontext             none                   default
sas/vm-301-disk-1  defcontext            none                   default
sas/vm-301-disk-1  rootcontext           none                   default
sas/vm-301-disk-1  redundant_metadata    all                    default

Have set ARC to 5G - same behavior. My device mapper devices are multipath SAS disks, each of them connected via two independent expanders. So they have already:

# cat /sys/block/dm-9/queue/rotational
1
# cat /sys/block/dm-10/queue/rotational
1
# cat /sys/block/dm-11/queue/rotational
1
# cat /sys/block/dm-12/queue/rotational
1
# cat /sys/block/dm-13/queue/rotational
1
richardelling commented 9 years ago

comment below...

On Oct 2, 2015, at 4:34 AM, redtex notifications@github.com wrote:

Hi !!! On a host - Centos 7.1 3.10.0-229.14.1.el7.x86_64, 32G RAM, ZoL 0.6.5.2 - which serves VM images in zvols via iSCSI (SCST), there is very strange situation: After upgrade from 0.6.4 to 0.6.5 I noticed a significant performance drop - which seems like near 100% disks busy in iostat. It looks like:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 2.00 169.00 79.00 6861.00 81.17 0.20 1.19 13.00 1.05 1.17 20.00 dm-10 0.00 0.00 0.00 51.00 0.00 752.00 29.49 0.09 1.78 0.00 1.78 1.76 9.00 dm-11 0.00 0.00 2.00 168.00 11.50 6861.00 80.85 0.21 1.25 6.00 1.20 1.24 21.00 dm-12 0.00 0.00 2.00 221.00 81.00 1421.50 13.48 0.44 1.96 11.50 1.87 1.95 43.50 dm-13 0.00 0.00 1.00 290.00 82.00 1793.50 12.89 0.43 1.47 8.00 1.45 1.47 42.90

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-10 0.00 0.00 0.00 56.00 0.00 2424.00 86.57 0.19 3.34 0.00 3.34 1.64 9.20 dm-11 0.00 0.00 1.00 0.00 6.50 0.00 13.00 0.01 9.00 9.00 0.00 9.00 0.90 dm-12 0.00 0.00 0.00 414.00 0.00 2600.50 12.56 1.00 2.42 0.00 2.42 2.40 99.20 dm-13 0.00 0.00 0.00 483.00 0.00 2432.50 10.07 0.99 2.07 0.00 2.07 2.04 98.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 1.00 0.00 6.50 0.00 13.00 0.01 11.00 11.00 0.00 11.00 1.10 dm-10 0.00 0.00 0.00 27.00 0.00 1224.00 90.67 0.11 4.22 0.00 4.22 1.85 5.00 dm-11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-12 0.00 0.00 2.00 495.00 2.50 1551.50 6.25 1.18 2.31 86.00 1.97 2.00 99.50 dm-13 0.00 0.00 0.00 489.00 0.00 2190.00 8.96 1.12 2.01 0.00 2.01 2.03 99.20

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 1.00 0.00 7.00 0.00 14.00 0.01 8.00 8.00 0.00 8.00 0.80 dm-10 0.00 0.00 0.00 60.00 0.00 604.00 20.13 0.10 1.72 0.00 1.72 1.68 10.10 dm-11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-12 0.00 0.00 2.00 366.00 12.50 1519.50 8.33 1.28 3.55 137.00 2.82 2.70 99.30 dm-13 0.00 0.00 0.00 402.00 0.00 1290.50 6.42 1.98 2.49 0.00 2.49 2.49 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 3.00 49.00 10.00 506.50 19.87 0.11 2.12 3.00 2.06 2.06 10.70 dm-10 0.00 0.00 0.00 83.00 0.00 912.00 21.98 0.16 1.92 0.00 1.92 1.64 13.60 dm-11 0.00 0.00 3.00 51.00 25.50 506.50 19.70 0.12 2.30 13.67 1.63 2.26 12.20 dm-12 0.00 0.00 1.00 366.00 1.00 1622.50 8.85 0.79 2.16 10.00 2.14 2.12 77.90 dm-13 0.00 0.00 1.00 198.00 53.00 1009.00 10.67 0.61 8.69 1226.00 2.55 3.02 60.10

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 4.00 159.00 22.50 6461.50 79.56 0.49 3.01 77.75 1.13 2.23 36.30 dm-10 0.00 0.00 0.00 183.00 0.00 10648.00 116.37 1.58 8.63 0.00 8.63 1.72 31.50 dm-11 0.00 0.00 4.00 154.00 15.00 6461.50 81.98 0.24 1.52 13.00 1.22 1.47 23.30 dm-12 0.00 0.00 4.00 284.00 164.00 2629.50 19.40 0.69 2.36 62.50 1.51 1.53 44.00 dm-13 0.00 0.00 2.00 279.00 48.50 2016.00 14.69 0.52 1.62 13.50 1.53 1.57 44.20

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-9 0.00 0.00 1.00 0.00 1.00 0.00 2.00 0.02 23.00 23.00 0.00 23.00 2.30 dm-10 0.00 0.00 0.00 41.00 0.00 280.00 13.66 0.07 1.83 0.00 1.83 1.80 7.40 dm-11 0.00 0.00 1.00 0.00 1.50 0.00 3.00 0.01 8.00 8.00 0.00 8.00 0.80 dm-12 0.00 0.00 1.00 430.00 128.00 2956.50 14.31 2.11 2.36 14.00 2.33 2.31 99.60 dm-13 0.00 0.00 0.00 473.00 0.00 2972.00 12.57 2.84 2.10 0.00 2.10 2.11 100.00 and zpool iostat is

                      capacity     operations    bandwidth

pool alloc free read write read write


sas 1.83T 1.79T 5 85 14.5K 4.67M mirror 938G 918G 0 0 0 0 35000cca02827b824 - - 0 0 0 0 35000cca02827b8c4 - - 0 0 0 0 mirror 938G 918G 5 0 14.5K 0 35000cca02827bb30 - - 3 0 12.5K 0 35000cca02827d228 - - 1 0 2.00K 0 logs - - - - - - 35000c5003330fa5b 50.7M 278G 0 85 0 4.67M


                      capacity     operations    bandwidth

pool alloc free read write read write


sas 1.83T 1.79T 0 1.33K 0 8.23M mirror 938G 918G 0 311 0 1.68M 35000cca02827b824 - - 0 301 0 1.68M 35000cca02827b8c4 - - 0 400 0 2.20M mirror 938G 918G 0 1.00K 0 6.26M 35000cca02827bb30 - - 0 146 0 6.26M 35000cca02827d228 - - 0 146 0 6.26M logs - - - - - - 35000c5003330fa5b 50.7M 278G 0 21 0 296K


                      capacity     operations    bandwidth

pool alloc free read write read write


sas 1.83T 1.79T 0 490 18.0K 2.52M mirror 938G 918G 0 481 0 2.35M 35000cca02827b824 - - 0 470 0 2.35M 35000cca02827b8c4 - - 0 470 0 2.22M

Average I/O size seems to be around 5k, confirmed by iostat data above. This usually implies one of two conditions:

  1. active dataset is not using the slog: unlikely in this case, given the data provided
  2. writes are not being coalesced

mirror 938G 918G 0 0 18.0K 0 35000cca02827bb30 - - 0 0 18.0K 0 35000cca02827d228 - - 0 0 0 0 logs - - - - - - 35000c5003330fa5b 50.7M 278G 0 8 0 180K


                      capacity     operations    bandwidth

pool alloc free read write read write


sas 1.83T 1.79T 9 408 61.5K 5.50M mirror 938G 918G 6 358 45.0K 1.78M 35000cca02827b824 - - 1 354 11.0K 1.78M 35000cca02827b8c4 - - 4 346 34.0K 2.14M mirror 938G 918G 2 0 16.5K 0 35000cca02827bb30 - - 1 0 11.0K 0 35000cca02827d228 - - 0 0 5.50K 0 logs - - - - - - 35000c5003330fa5b 50.7M 278G 0 49 0 3.72M


So, it's clearly seen, that one of mirrors has many times lower size of IO (avgrq-sz) which leads to huge performance drop. And I noticed, that such behavior starts after some significant time of work (several hours). After system reboot. The ARC size is 20G.

This has nothing to do with the ARC, so you can look elsewhere.

It is not clear, from data provided, what the sample interval is. If the sample interval is small, say 1 second, then this data looks about right for 5K average I/O size. ZFS will write about 1MB to a top-level vdev before switching to the next, so you can easily get samples where all of the writes go to one or the other. If the sample period is large, say 100 seconds, then we'd expect the balance to even out. Judging by the space allocated, the stripes are balanced. -- richard

redtex commented 9 years ago

The sample interval is 1 second. This looks like 5 second cycle: on the both mirrors data writes simultaneously but with different IO size, so on first mirror data are written within 1 second, and on another within 4-5 seconds

redtex commented 9 years ago

So, what advise will be ? To downgrade to 0.6.4.2 ?

redtex commented 9 years ago

Upd: so, I discovered with help of iotop, that those 5-second cycle is txg_sync process, which flushes async_writes to discs. But when I try to strace it - I get error:

# strace -p 15190
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

Maybe I can somehow get debug info by other way ? Or it will be better to downgrade, and just wait......

behlendorf commented 9 years ago

@redtex these are kernel threads so you can't follow them with strace. My suggestion would be to first roll back to 0.6.4.2 and characterize the behavior there. Then we'll have a much better idea how it's changed and why that might be.

redtex commented 9 years ago

Is it possible, to use pool upgraded to 0.6.5 - features large_blocks and filesystem_limits are enabled, but not used - with zfs 0.6.4 ?? Zpool imports such pool, but I'm afraid to broke data.

behlendorf commented 9 years ago

If you're able to import the pool r/w then it's safe to write to the pool.

redtex commented 9 years ago

Hi !!! Before all, I want to remind, that uneven write IO size begins after some warm-up time - one or two hours after system boot. The workload is servicg VM images via iSCSI - SCST 3.1 target. Here what I've done: Before all, I've replaced rotational log device onto SSD log & cache (the same device partitioned). It was done on the fly, without service downtime. Before this operation I saw just what I expect - uneven IO size between mirrors. And just after replacement, and for some time later - I can't be sure, but I think it's about an hour, or little more - writes IO become uneven again. But after that, adding/removing log & cache separately or all together don't take this effect again. Veeery strange..... So, early morning I've downgraded ZFS from 0.6.5.2 to 0.6.4.2 and have just almost what expected: evenly distributed IO size between mirrors, or if it's not evenly but not more than two times difference, and not for a long. See 'iostat -d -x 1':

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    0.00  241.00     0.00  6303.50    52.31     0.52    2.17    0.00    2.17   2.06  49.60
dm-3              0.00     0.00    2.00  152.00    31.50  8010.50   104.44     0.29    1.91   10.50    1.80   1.81  27.80
dm-4              0.00     0.00    0.00  150.00     0.00  8010.50   106.81     0.41    2.75    0.00    2.75   2.71  40.60
dm-8              0.00     0.00    0.00  246.00     0.00  6303.50    51.25     0.45    1.83    0.00    1.83   1.72  42.40
dm-13             0.00     0.00    0.00   55.00     0.00   532.00    19.35     0.01    0.11    0.00    0.11   0.11   0.60
dm-15             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    1.00    0.00    22.00     0.00    44.00     0.02   18.00   18.00    0.00  18.00   1.80
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-13             0.00     0.00    0.00   75.00     0.00   796.00    21.23     0.01    0.12    0.00    0.12   0.12   0.90
dm-15             0.00     0.00    0.00  182.00     0.00 12838.00   141.08     0.04    0.22    0.00    0.22   0.22   4.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-13             0.00     0.00    0.00   48.00     0.00   408.00    17.00     0.01    0.10    0.00    0.10   0.10   0.50
dm-15             0.00     0.00    1.00    0.00    26.00     0.00    52.00     0.00    1.00    1.00    0.00   1.00   0.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-13             0.00     0.00    0.00   12.00     0.00    96.00    16.00     0.00    0.17    0.00    0.17   0.17   0.20
dm-15             0.00     0.00    3.00    0.00    12.00     0.00     8.00     0.00    0.33    0.33    0.00   0.33   0.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-13             0.00     0.00    0.00   49.00     0.00   608.00    24.82     0.01    0.14    0.00    0.14   0.14   0.70
dm-15             0.00     0.00    5.00    0.00   172.00     0.00    68.80     0.00    0.20    0.20    0.00   0.20   0.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    5.00  243.00   377.00  7600.00    64.33     0.40    1.63   16.20    1.33   1.42  35.30
dm-3              0.00     0.00    0.00  155.00     0.00  7704.00    99.41     0.27    1.71    0.00    1.71   1.59  24.60
dm-4              0.00     0.00    0.00  159.00     0.00  7704.00    96.91     0.37    2.35    0.00    2.35   2.33  37.00
dm-8              0.00     0.00    3.00  242.00    67.00  7600.00    62.59     0.49    2.01   42.67    1.50   1.66  40.60
dm-13             0.00     0.00    0.00   44.00     0.00   576.00    26.18     0.01    0.18    0.00    0.18   0.18   0.80
dm-15             0.00     0.00    4.00  135.00   208.00  6497.00    96.47     0.03    0.18    0.00    0.19   0.18   2.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-13             0.00     0.00    0.00   39.00     0.00   404.00    20.72     0.01    0.21    0.00    0.21   0.21   0.80
dm-15             0.00     0.00    0.00   49.00     0.00  2725.50   111.24     0.01    0.24    0.00    0.24   0.24   1.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-13             0.00     0.00    0.00    8.00     0.00    56.00    14.00     0.00    0.25    0.00    0.25   0.25   0.20
dm-15             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    1.00    0.00     1.50     0.00     3.00     0.01   14.00   14.00    0.00  14.00   1.40
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-8              0.00     0.00    1.00    0.00    35.50     0.00    71.00     0.02   17.00   17.00    0.00  17.00   1.70
dm-13             0.00     0.00    0.00   23.00     0.00   844.00    73.39     0.00    0.17    0.00    0.17   0.17   0.40
dm-15             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-13             0.00     0.00    0.00  106.00     0.00  4680.00    88.30     0.04    0.42    0.00    0.42   0.12   1.30
dm-15             0.00     0.00    2.00    0.00     8.00     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00

Where dm-2 and dm-8 one mirror, dm-3 and dm-8 other mirror, dm-13 log, dm-15 cache 'zpool iostat -v sas 1':

                          capacity     operations    bandwidth
pool                   alloc   free   read  write   read  write
---------------------  -----  -----  -----  -----  -----  -----
sas                    1.83T  1.79T     58  1.73K  80.4K  7.16M
  mirror                939G   917G     24    774  36.5K  3.20M
    35000cca02827b824      -      -     10    100  12.5K  3.20M
    35000cca02827b8c4      -      -     12     99  24.0K  3.20M
  mirror                939G   917G     33    980  44.0K  3.68M
    35000cca02827bb30      -      -     11    118  17.0K  3.68M
    35000cca02827d228      -      -     11    110  31.5K  3.68M
logs                       -      -      -      -      -      -
  35000cca04d0f99c0p1   109M  3.61G      0     14      0   288K
cache                      -      -      -      -      -      -
  35000cca04d0f99c0p3  31.5G  24.4G      0      0      0      0
---------------------  -----  -----  -----  -----  -----  -----

                          capacity     operations    bandwidth
pool                   alloc   free   read  write   read  write
---------------------  -----  -----  -----  -----  -----  -----
sas                    1.83T  1.79T      5     12   496K   264K
  mirror                939G   917G      1      0   118K      0
    35000cca02827b824      -      -      0      0  86.5K      0
    35000cca02827b8c4      -      -      0      0  31.5K      0
  mirror                939G   917G      3      0   378K      0
    35000cca02827bb30      -      -      2      0   287K      0
    35000cca02827d228      -      -      0      0  91.4K      0
logs                       -      -      -      -      -      -
  35000cca04d0f99c0p1   109M  3.61G      0     12      0   264K
cache                      -      -      -      -      -      -
  35000cca04d0f99c0p3  31.5G  24.4G      4    122   430K  3.28M
---------------------  -----  -----  -----  -----  -----  -----

                          capacity     operations    bandwidth
pool                   alloc   free   read  write   read  write
---------------------  -----  -----  -----  -----  -----  -----
sas                    1.83T  1.79T      7     10  10.5K   192K
  mirror                939G   917G      0      0  4.00K      0
    35000cca02827b824      -      -      0      0      0      0
    35000cca02827b8c4      -      -      0      0  4.00K      0
  mirror                939G   917G      6      0  6.49K      0
    35000cca02827bb30      -      -      2      0  3.00K      0
    35000cca02827d228      -      -      3      0  3.50K      0
logs                       -      -      -      -      -      -
  35000cca04d0f99c0p1   109M  3.61G      0     10      0   192K
cache                      -      -      -      -      -      -
  35000cca04d0f99c0p3  31.5G  24.4G      0     16      0   486K
---------------------  -----  -----  -----  -----  -----  -----

Regards, Wadim.

redtex commented 9 years ago

@behlendorf there is any ideas ? Do I need to provide other details/logs ?

redtex commented 8 years ago

4512

dweeezil commented 8 years ago

@redtex Wandering back into this issue due to the #4512 reference. I've reviewed your last iostat output and it certainly does show a difference between the 2 top-level mirror vdevs. By any chance, was this pool originally created with a single mirror and then later the second mirror added? Although an earlier zpool iostat -v did show them to be equally full, there could be a whole lot more fragmentation in one versus the other especially if they weren't added to the pool at the same time. Another thing that can impact fragmentation is growing the vdevs; when they're grown, the system creates new metaslabs which start out as completely unfragmented. This is another type of problem in which the new zpool iostat features of 0.7.0 could help quite a bit.

dweeezil commented 8 years ago

@redtex One other thing to check is that both your top-level vdevs have the same ashift. You can run zdb -l /dev/disk/by-XXX/<whatever> | grep ashift (possibly partition 1 if full disk) on each disk to make sure.

redtex commented 8 years ago

@dweeezil Yes, pool was originally created with two mirrored vdevs. Vdevs consists of four same SAS 512 bytes/sector disks, so ashift for each vdev is 9.

redtex commented 8 years ago

Today, I've discovered how to reproduce this issue:

Configuration: 2 core kvm virtual machine with 4Gb RAM, 4 physical disks (WD Raptor) passed-through with SCSI-virtio CentOS 7 kernel 3.10.0-327.36.1.el7.x86_64

zfs non-default tunables: zfs_vdev_aggregation_limit=524288 Honestly, I think that this tunable is worthless for issue, but anyway, it was set, so I post it here.

First, create a fresh mirrored pool with 2 vdevs: # zpool create -f tank mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde Second, create zvol: # zfs create -b 4k -V 10G -o refreservation=none -o compress=off -o primarycache=metadata tank/zvol4k-fiotest Third, fill zvol with random data: # dd if=/dev/zero bs=1M | openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt | dd of=/dev/tank/zvol4k-fiotest bs=1M Prepare a fio job file:

# vi ./fio_pattern.4k
[global]
filename=/dev/tank/zvol4k-fiotest
ioengine=libaio
io_submit_mode=offload
direct=1
buffered=0
buffer_compress_percentage=0
refill_buffers=1
runtime=300

[4kRead]
blocksize=4k
readwrite=randread
iodepth=120

[4kWrite]
blocksize=4k
readwrite=randwrite
iodepth=120

Run fio test: fio fio_pattern.4k

results for ZoL 0.6.4.2

4kRead: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
4kWrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
fio-2.2.8
Starting 2 processes
^Cbs: 2 (f=2): [r(1),w(1)] [6.2% done] [5412KB/8656KB/0KB /s] [1353/2164/0 iops] [eta 29m:39s]
fio: terminating on signal 2

4kRead: (groupid=0, jobs=1): err= 0: pid=1839: Thu Oct  6 11:21:58 2016
  read : io=647520KB, bw=5532.8KB/s, iops=1383, runt=117035msec
    slat (usec): min=11, max=280986, avg=36642.53, stdev=29021.74
    clat (msec): min=8, max=357, avg=49.92, stdev=34.72
     lat (msec): min=9, max=450, avg=86.56, stdev=49.23
    clat percentiles (msec):
     |  1.00th=[   18],  5.00th=[   21], 10.00th=[   23], 20.00th=[   26],
     | 30.00th=[   30], 40.00th=[   35], 50.00th=[   40], 60.00th=[   46],
     | 70.00th=[   55], 80.00th=[   67], 90.00th=[   87], 95.00th=[  116],
     | 99.00th=[  196], 99.50th=[  212], 99.90th=[  245], 99.95th=[  262],
     | 99.99th=[  289]
    bw (KB  /s): min= 1656, max= 8416, per=100.00%, avg=5538.84, stdev=1693.65
    lat (msec) : 10=0.01%, 20=4.68%, 50=60.37%, 100=28.09%, 250=6.78%
    lat (msec) : 500=0.07%
  cpu          : usr=0.75%, sys=1.22%, ctx=131956, majf=0, minf=18
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=161880/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=120
4kWrite: (groupid=0, jobs=1): err= 0: pid=1840: Thu Oct  6 11:21:58 2016
  write: io=989.85MB, bw=8662.4KB/s, iops=2165, runt=117011msec
    slat (usec): min=10, max=280928, avg=27429.50, stdev=30133.64
    clat (msec): min=3, max=298, avg=27.76, stdev=22.28
     lat (msec): min=5, max=406, avg=55.19, stdev=41.84
    clat percentiles (msec):
     |  1.00th=[   12],  5.00th=[   14], 10.00th=[   15], 20.00th=[   16],
     | 30.00th=[   18], 40.00th=[   19], 50.00th=[   21], 60.00th=[   23],
     | 70.00th=[   28], 80.00th=[   35], 90.00th=[   48], 95.00th=[   61],
     | 99.00th=[  151], 99.50th=[  167], 99.90th=[  190], 99.95th=[  196],
     | 99.99th=[  233]
    bw (KB  /s): min= 1880, max=15512, per=100.00%, avg=8670.58, stdev=3524.25
    lat (msec) : 4=0.01%, 10=0.22%, 20=47.66%, 50=43.55%, 100=6.53%
    lat (msec) : 250=2.04%, 500=0.01%
  cpu          : usr=1.19%, sys=1.46%, ctx=148656, majf=0, minf=15
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=253396/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=120

Run status group 0 (all jobs):
   READ: io=647520KB, aggrb=5532KB/s, minb=5532KB/s, maxb=5532KB/s, mint=117035msec, maxt=117035msec
  WRITE: io=989.85MB, aggrb=8662KB/s, minb=8662KB/s, maxb=8662KB/s, mint=117011msec, maxt=117011msec

iostat -d -x sdb sdc sdd sde 1

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  446.53  290.10  1817.82 32376.24    92.84     7.59   10.33   16.29    1.17   1.35  99.21
sdb               0.00     0.00  385.15  289.11  1544.55 32376.24   100.62     7.64   11.24   18.70    1.29   1.47  99.21
sdd               0.00     0.00  428.71  291.09  1718.81 32530.69    95.16     7.41   10.49   16.72    1.31   1.37  98.81
sde               0.00     0.00  434.65  288.12  1742.57 32530.69    94.84     7.53   10.53   16.73    1.16   1.37  99.21

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  347.00   42.00  1408.00   464.00     9.62     7.17   18.34   19.49    8.81   2.56  99.70
sdb               0.00     0.00  321.00   45.00  1300.00   464.00     9.64     7.16   19.50   20.99    8.89   2.70  99.00
sdd               0.00     0.00  315.00   55.00  1260.00   844.00    11.37     8.77   23.78   26.22    9.78   2.70 100.00
sde               0.00     0.00  333.00   54.00  1344.00   844.00    11.31     8.52   22.15   24.22    9.37   2.58 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  315.00  298.00  1272.00 34196.00   115.72     7.48   12.25   22.08    1.86   1.63 100.00
sdb               0.00     0.00  294.00  297.00  1184.00 34196.00   119.73     7.67   13.01   24.22    1.91   1.69 100.00
sdd               0.00     0.00  324.00  277.00  1300.00 32036.00   110.94     7.90   13.06   22.55    1.95   1.66 100.00
sde               0.00     0.00  304.00  291.00  1236.00 32036.00   111.84     7.95   13.25   23.99    2.04   1.68  99.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  400.00  275.25  1619.80 31853.47    99.14     7.48    9.95   16.20    0.87   1.48  99.90
sdb               0.00     0.00  362.38  265.35  1461.39 31853.47   106.15     7.56   11.11   18.41    1.14   1.59 100.00
sdd               0.00     0.00  381.19  289.11  1536.63 33695.05   105.12     7.81   10.24   17.20    1.07   1.49 100.00
sde               0.00     0.00  372.28  290.10  1504.95 33695.05   106.28     8.08   10.97   18.67    1.10   1.51 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  430.00   52.00  1728.00  3624.00    22.21     8.20   18.79   20.35    5.85   2.06  99.10
sdb               0.00     0.00  435.00   56.00  1748.00  3628.00    21.90     8.26   18.24   19.86    5.68   2.04 100.00
sdd               0.00     0.00  413.00   30.00  1660.00  2852.00    20.37     7.23   18.27   18.95    8.93   2.26  99.90
sde               0.00     0.00  463.00   36.00  1876.00  3612.00    22.00     7.00   15.55   16.24    6.61   2.00 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  329.00  303.00  1320.00 29480.00    97.47     7.38   11.45   20.36    1.77   1.59 100.50
sdb               0.00     0.00  330.00  288.00  1320.00 29476.00    99.66     7.34   11.72   20.45    1.72   1.63 100.70
sdd               0.00     0.00  328.00  294.00  1324.00 30372.00   101.92     8.41   13.57   24.05    1.88   1.62 100.70
sde               0.00     0.00  338.00  291.00  1356.00 29612.00    98.47     8.48   13.53   23.56    1.87   1.60 100.70

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  375.00  353.00  1508.00 33036.00    94.90     6.75    8.30   15.38    0.77   1.37  99.80
sdb               0.00     0.00  354.00  353.00  1420.00 33036.00    97.47     6.69    8.61   16.33    0.87   1.41 100.00
sdd               0.00     0.00  393.00  362.00  1580.00 33204.00    92.14     8.02    8.83   16.24    0.78   1.32 100.00
sde               0.00     0.00  378.00  364.00  1512.00 33204.00    93.57     8.21    9.89   18.60    0.85   1.35 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  436.00  220.00  1744.00 25084.00    81.79     6.76   11.46   16.25    1.98   1.53 100.10
sdb               0.00     0.00  388.00  197.00  1556.00 22536.00    82.37     6.99   12.93   18.29    2.38   1.71 100.10
sdd               0.00     0.00  399.00  200.00  1604.00 23884.00    85.10     8.21   16.21   22.99    2.67   1.67 100.10
sde               0.00     0.00  426.00  206.00  1708.00 23724.00    80.48     7.99   13.84   19.40    2.32   1.58  99.70

As you see - noting unusual, all operations spread equally through physical disks. Read bw=5532.8KB/s Write bw=8662.4KB/s

results for ZoL 0.7.0-rc1

4kRead: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
4kWrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
fio-2.2.8
Starting 2 processes
Jobs: 1 (f=1): [r(1),E(1)] [16.2% done] [3164KB/6600KB/0KB /s] [791/1650/0 iops] [eta 26m:00s]
4kRead: (groupid=0, jobs=1): err= 0: pid=1355: Thu Oct  6 11:46:16 2016
  read : io=1655.5MB, bw=5631.3KB/s, iops=1407, runt=301036msec
    slat (usec): min=19, max=1674.9K, avg=84165.80, stdev=86037.04
    clat (usec): min=0, max=793, avg= 4.30, stdev= 5.06
     lat (usec): min=22, max=1674.9K, avg=84173.74, stdev=86037.11
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    2], 10.00th=[    2], 20.00th=[    2],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    4], 60.00th=[    4],
     | 70.00th=[    4], 80.00th=[    5], 90.00th=[    6], 95.00th=[    8],
     | 99.00th=[   25], 99.50th=[   33], 99.90th=[   59], 99.95th=[   74],
     | 99.99th=[  121]
    bw (KB  /s): min=  285, max= 8104, per=100.00%, avg=5661.88, stdev=1379.09
    lat (usec) : 2=2.63%, 4=46.80%, 10=47.02%, 20=2.01%, 50=1.36%
    lat (usec) : 100=0.15%, 250=0.02%, 500=0.01%, 750=0.01%, 1000=0.01%
  cpu          : usr=0.72%, sys=1.07%, ctx=273123, majf=0, minf=18
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=423801/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=120
4kWrite: (groupid=0, jobs=1): err= 0: pid=1357: Thu Oct  6 11:46:16 2016
  write: io=4854.9MB, bw=16569KB/s, iops=4142, runt=300035msec
    slat (usec): min=37, max=703340, avg=27340.73, stdev=43195.54
    clat (usec): min=0, max=24786, avg= 3.49, stdev=24.80
     lat (usec): min=41, max=703348, avg=27346.97, stdev=43196.28
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    1], 10.00th=[    1], 20.00th=[    2],
     | 30.00th=[    2], 40.00th=[    2], 50.00th=[    3], 60.00th=[    3],
     | 70.00th=[    3], 80.00th=[    4], 90.00th=[    5], 95.00th=[    6],
     | 99.00th=[   23], 99.50th=[   32], 99.90th=[   66], 99.95th=[   87],
     | 99.99th=[  187]
    bw (KB  /s): min=  894, max=51440, per=100.00%, avg=16744.90, stdev=10056.05
    lat (usec) : 2=15.17%, 4=55.15%, 10=27.07%, 20=1.35%, 50=1.06%
    lat (usec) : 100=0.16%, 250=0.03%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 50=0.01%
  cpu          : usr=1.75%, sys=2.21%, ctx=322365, majf=0, minf=15
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=1242830/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=120

Run status group 0 (all jobs):
   READ: io=1655.5MB, aggrb=5631KB/s, minb=5631KB/s, maxb=5631KB/s, mint=301036msec, maxt=301036msec
  WRITE: io=4854.9MB, aggrb=16569KB/s, minb=16569KB/s, maxb=16569KB/s, mint=300035msec, maxt=300035msec

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  381.00 2297.00  1524.00 30128.00    23.64    11.88    4.50   25.34    1.05   0.37  99.70
sdd               0.00     0.00  389.00 2324.00  1556.00 30128.00    23.36    11.69    4.37   24.90    0.94   0.37  99.70
sde               0.00     0.00  340.00 2556.00  1360.00 41800.00    29.81    14.14    4.99   28.86    1.81   0.34  99.70
sdf               0.00     0.00  349.00 2765.00  1396.00 42204.00    28.00    13.47    4.40   28.80    1.32   0.32  99.70

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  282.00 2459.00  1128.00 29792.00    22.56     9.19    3.24   21.80    1.12   0.36  99.80
sdd               0.00     0.00  329.00 2599.00  1316.00 29792.00    21.25     9.20    3.15   21.04    0.88   0.34  99.90
sde               0.00     0.00  309.00 2802.00  1236.00 44880.00    29.65    13.70    4.55   33.13    1.40   0.32 100.00
sdf               0.00     0.00  356.00 3008.00  1424.00 43220.00    26.54    13.19    4.02   28.50    1.12   0.30 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  201.00 1331.00   804.00 19816.00    26.92    11.31    7.49   47.41    1.46   0.65 100.20
sdd               0.00     0.00  300.00 1683.00  1200.00 19936.00    21.32    10.76    5.44   30.94    0.90   0.51 100.20
sde               0.00     0.00  248.00 1340.00   992.00 14812.00    19.90    11.39    7.31   39.80    1.30   0.63 100.20
sdf               0.00     0.00  306.00 1474.00  1224.00 14812.00    18.02    11.01    6.20   31.96    0.85   0.56 100.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  358.00 3631.00  1432.00 43904.00    22.73    13.70    3.43   27.31    1.07   0.25 100.00
sdd               0.00     0.00  422.00 4237.00  1688.00 43824.00    19.54    12.82    2.73   22.91    0.72   0.21 100.00
sde               0.00     0.00  398.00 2786.00  1592.00 31960.00    21.08    12.68    3.95   24.08    1.08   0.31 100.00
sdf               0.00     0.00  414.00 3009.00  1676.00 31960.00    19.65    12.47    3.63   23.32    0.92   0.29 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  337.62 3292.08  1350.50 44273.27    25.14    13.17    3.31   25.54    1.03   0.27  99.11
sdd               0.00     0.00  361.39 2731.68  1445.54 44225.74    29.53    13.66    4.35   26.38    1.44   0.32  99.11
sde               0.00     0.00  401.98 2526.73  1607.92 30388.12    21.85    11.78    4.09   23.71    0.97   0.34  99.11
sdf               0.00     0.00  393.07 2653.47  1572.28 30522.77    21.07    11.50    3.83   23.66    0.89   0.32  98.61

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00    2.00 4767.00     8.00 48292.00    20.26    12.03    0.90 1128.00    0.43   0.21 100.00
sdd               0.00     0.00  222.00 3698.00   916.00 48300.00    25.11     7.38    2.00   20.99    0.86   0.25  97.00
sde               0.00     0.00  144.00 2679.00   576.00 31548.00    22.76     4.24    1.51   12.51    0.92   0.32  91.60
sdf               0.00     0.00  138.00 2603.00   552.00 31412.00    23.32     4.53    1.66   12.97    1.06   0.34  92.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  329.00 2613.00  1316.00 38232.00    26.89    13.77    7.74   57.40    1.49   0.34 100.00
sdd               0.00     0.00  345.00 2867.00  1380.00 38232.00    24.67    12.73    3.92   26.82    1.17   0.31  99.90
sde               0.00     0.00  307.00 2754.00  1228.00 29216.00    19.89     8.60    2.49   16.86    0.89   0.32  99.40
sdf               0.00     0.00  322.00 2584.00  1288.00 29216.00    20.99     8.23    2.75   17.79    0.88   0.34  98.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  254.00 1221.00  1016.00 16108.00    23.22    11.55    7.83   38.72    1.41   0.68 100.00
sdd               0.00     0.00  288.00 1222.00  1180.00 16112.00    22.90    11.16    7.34   33.71    1.12   0.66 100.00
sde               0.00     0.00  276.00 1410.00  1104.00 19020.00    23.87    11.39    7.19   37.39    1.27   0.59 100.00
sdf               0.00     0.00  288.00 1435.00  1152.00 19020.00    23.41    11.30    6.59   33.35    1.22   0.58 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  407.00 2401.00  1628.00 28532.00    21.48    12.08    4.30   23.37    1.06   0.36  99.90
sdd               0.00     0.00  469.00 2539.00  1908.00 28520.00    20.23    11.43    3.84   20.98    0.67   0.33 100.10
sde               0.00     0.00  390.00 2741.00  1560.00 37176.00    24.74    12.68    4.03   24.52    1.11   0.32 100.10
sdf               0.00     0.00  422.00 3046.00  1688.00 37460.00    22.58    12.57    3.63   23.08    0.94   0.29 100.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  349.00 2580.00  1396.00 31468.00    22.44    11.44    3.90   23.46    1.25   0.34 100.00
sdd               0.00     0.00  372.00 2557.00  1488.00 31476.00    22.51    11.68    4.01   23.26    1.21   0.34 100.00
sde               0.00     0.00  330.00 2895.00  1320.00 42860.00    27.40    14.33    4.47   29.93    1.57   0.31 100.00
sdf               0.00     0.00  342.00 3170.00  1368.00 42584.00    25.03    14.13    3.96   28.11    1.35   0.28 100.00

Again, noting unusual, all operations spread equally through physical disks. Results even better than 0.6.4.2, especially writes: Read bw=5631.3KB/s Write bw=16569KB/s

Now, turning primarycache=all zfs set primarycache=all tank/zvol4k-fiotest

results for ZoL 0.6.4.2

4kRead: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
4kWrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
fio-2.2.8
Starting 2 processes
Jobs: 2 (f=2): [r(1),w(1)] [100.0% done] [6636KB/10668KB/0KB /s] [1659/2667/0 iops] [eta 00m:00s]
4kRead: (groupid=0, jobs=1): err= 0: pid=2841: Thu Oct  6 12:11:50 2016
  read : io=1655.2MB, bw=5647.5KB/s, iops=1411, runt=300086msec
    slat (usec): min=11, max=261559, avg=35423.51, stdev=28289.59
    clat (msec): min=3, max=369, avg=49.35, stdev=34.43
     lat (msec): min=6, max=474, avg=84.78, stdev=47.62
    clat percentiles (msec):
     |  1.00th=[   15],  5.00th=[   19], 10.00th=[   22], 20.00th=[   26],
     | 30.00th=[   30], 40.00th=[   34], 50.00th=[   39], 60.00th=[   46],
     | 70.00th=[   55], 80.00th=[   67], 90.00th=[   87], 95.00th=[  116],
     | 99.00th=[  194], 99.50th=[  210], 99.90th=[  243], 99.95th=[  255],
     | 99.99th=[  285]
    bw (KB  /s): min= 1776, max= 8624, per=100.00%, avg=5654.94, stdev=1603.54
    lat (msec) : 4=0.01%, 10=0.04%, 20=6.90%, 50=58.77%, 100=27.38%
    lat (msec) : 250=6.84%, 500=0.07%
  cpu          : usr=0.79%, sys=1.23%, ctx=326363, majf=0, minf=18
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=423683/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=120
4kWrite: (groupid=0, jobs=1): err= 0: pid=2842: Thu Oct  6 12:11:50 2016
  write: io=2557.3MB, bw=8727.3KB/s, iops=2181, runt=300044msec
    slat (usec): min=10, max=258294, avg=27412.46, stdev=29073.49
    clat (msec): min=3, max=298, avg=27.33, stdev=20.74
     lat (msec): min=5, max=395, avg=54.74, stdev=39.04
    clat percentiles (msec):
     |  1.00th=[   12],  5.00th=[   14], 10.00th=[   15], 20.00th=[   17],
     | 30.00th=[   19], 40.00th=[   20], 50.00th=[   21], 60.00th=[   24],
     | 70.00th=[   27], 80.00th=[   34], 90.00th=[   45], 95.00th=[   57],
     | 99.00th=[  145], 99.50th=[  161], 99.90th=[  182], 99.95th=[  190],
     | 99.99th=[  221]
    bw (KB  /s): min= 1972, max=14797, per=100.00%, avg=8742.75, stdev=3149.20
    lat (msec) : 4=0.01%, 10=0.25%, 20=43.81%, 50=48.77%, 100=5.24%
    lat (msec) : 250=1.93%, 500=0.01%
  cpu          : usr=1.24%, sys=1.50%, ctx=369485, majf=0, minf=15
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=654643/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=120

Run status group 0 (all jobs):
   READ: io=1655.2MB, aggrb=5647KB/s, minb=5647KB/s, maxb=5647KB/s, mint=300086msec, maxt=300086msec
  WRITE: io=2557.3MB, aggrb=8727KB/s, minb=8727KB/s, maxb=8727KB/s, mint=300044msec, maxt=300044msec

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  404.00   42.00  1616.00  4264.00    26.37     8.07   18.68   20.00    5.93   2.24 100.00
sdb               0.00     0.00  348.00   43.00  1392.00  4448.00    29.87     8.45   22.10   23.85    7.93   2.56 100.00
sdd               0.00     0.00  366.00   41.00  1464.00  4148.00    27.58     7.60   19.09   20.39    7.46   2.46 100.00
sde               0.00     0.00  366.00   44.00  1464.00  4532.00    29.25     7.62   18.99   20.42    7.16   2.44 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  266.00  248.00  1064.00 28776.00   116.11     6.78   13.03   23.36    1.94   1.93  99.00
sdb               0.00     0.00  225.00  246.00   900.00 28592.00   125.23     6.74   14.31   27.56    2.19   2.09  98.60
sdd               0.00     0.00  277.00  258.00  1108.00 29200.00   113.30     8.34   15.76   28.27    2.33   1.87  99.80
sde               0.00     0.00  278.00  254.00  1112.00 28816.00   112.51     8.48   16.05   28.43    2.51   1.88 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  397.00  456.00  1588.00 33380.00    81.99     8.15    9.16   18.84    0.74   1.17 100.00
sdb               0.00     0.00  362.00  415.00  1448.00 33380.00    89.65     8.23   10.13   20.74    0.88   1.29 100.00
sdd               0.00     0.00  344.00  435.00  1376.00 32892.00    87.98     7.14    8.62   18.45    0.84   1.28  99.50
sde               0.00     0.00  368.00  418.00  1472.00 32892.00    87.44     7.08    8.50   17.16    0.87   1.27 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  330.00  123.00  1320.00 12364.00    60.42     8.10   18.59   24.18    3.61   2.21 100.30
sdb               0.00     0.00  316.00  152.00  1264.00 12956.00    60.77     8.06   17.93   25.25    2.70   2.14 100.30
sdd               0.00     0.00  329.00  127.00  1316.00 11044.00    54.21     7.51   17.00   22.41    2.97   2.20 100.30
sde               0.00     0.00  332.00  130.00  1328.00 11568.00    55.83     7.46   16.90   22.44    2.73   2.17 100.30

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  327.00  315.00  1308.00 21072.00    69.72     7.87   12.40   22.86    1.53   1.56 100.00
sdb               0.00     0.00  299.00  274.00  1196.00 20480.00    75.66     7.67   13.53   24.23    1.86   1.75 100.00
sdd               0.00     0.00  317.00  261.00  1268.00 21888.00    80.12     8.35   14.67   25.04    2.07   1.73 100.00
sde               0.00     0.00  342.00  289.00  1368.00 21364.00    72.05     8.27   13.18   22.84    1.74   1.58 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  317.00  312.00  1268.00 33128.00   109.37     8.10   12.79   23.76    1.63   1.59 100.00
sdb               0.00     0.00  283.00  303.00  1132.00 33128.00   116.93     8.26   13.71   26.49    1.77   1.71 100.00
sdd               0.00     0.00  300.00  292.00  1200.00 33144.00   116.03     7.28   12.39   22.91    1.59   1.69  99.90
sde               0.00     0.00  295.00  336.00  1180.00 33144.00   108.79     7.52   11.49   23.51    0.93   1.58 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  380.00  171.00  1520.00 20688.00    80.61     8.42   15.06   21.10    1.65   1.81 100.00
sdb               0.00     0.00  392.00  185.00  1568.00 20872.00    77.78     8.28   14.09   20.03    1.50   1.73 100.00
sdd               0.00     0.00  342.00  218.00  1368.00 19544.00    74.69     6.90   12.37   19.44    1.27   1.79 100.00
sde               0.00     0.00  367.00  229.00  1468.00 19416.00    70.08     6.82   11.94   18.02    2.20   1.68 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  354.00  180.00  1416.00 13560.00    56.09     7.27   14.03   20.22    1.87   1.87 100.00
sdb               0.00     0.00  318.00  158.00  1272.00 13376.00    61.55     7.50   16.54   23.48    2.59   2.10 100.00
sdd               0.00     0.00  314.00  144.00  1256.00 12536.00    60.23     8.13   17.37   23.97    2.99   2.18 100.00
sde               0.00     0.00  315.00  149.00  1260.00 12664.00    60.02     8.36   17.85   24.96    2.83   2.15  99.90

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  273.00  281.00  1092.00 32520.00   121.34     7.30   12.87   24.15    1.90   1.80  99.80
sdb               0.00     0.00  291.00  281.00  1164.00 32520.00   117.78     7.61   13.24   24.09    2.02   1.75 100.00
sdd               0.00     0.00  291.00  291.00  1164.00 33828.00   120.25     8.35   14.53   26.92    2.13   1.72 100.00
sde               0.00     0.00  293.00  292.00  1172.00 33828.00   119.66     8.21   14.19   26.42    1.92   1.71 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  358.00  292.00  1432.00 33704.00   108.11     7.08   10.98   19.10    1.02   1.54 100.00
sdb               0.00     0.00  371.00  293.00  1484.00 33704.00   105.99     6.86   10.24   17.52    1.01   1.50  99.60
sdd               0.00     0.00  379.00  274.00  1516.00 32596.00   104.48     8.06   12.16   20.16    1.08   1.53 100.10
sde               0.00     0.00  379.00  274.00  1516.00 32596.00   104.48     8.19   12.26   20.34    1.08   1.53 100.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  344.00   39.00  1376.00   436.00     9.46     9.13   23.49   24.91   10.97   2.61 100.10
sdb               0.00     0.00  313.00   38.00  1252.00   436.00     9.62     9.36   26.72   28.38   13.03   2.85 100.00
sdd               0.00     0.00  313.00   31.00  1252.00   260.00     8.79     6.89   20.43   21.17   13.00   2.91 100.00
sde               0.00     0.00  317.00   29.00  1268.00   260.00     8.83     6.90   20.35   20.96   13.62   2.89 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  306.00  351.00  1224.00 34360.00   108.32     8.79   13.58   27.14    1.77   1.52 100.00
sdb               0.00     0.00  274.00  337.00  1096.00 34360.00   116.06     8.84   14.49   29.98    1.90   1.64 100.00
sdd               0.00     0.00  255.00  334.00  1020.00 31988.00   112.08     5.87   10.04   21.20    1.53   1.69  99.80
sde               0.00     0.00  265.00  340.00  1060.00 31988.00   109.25     5.83    9.66   20.07    1.55   1.64  99.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  380.00  338.00  1520.00 33440.00    97.38     7.31    9.63   17.41    0.88   1.39  99.90
sdb               0.00     0.00  359.00  346.00  1436.00 33440.00    98.94     7.34    9.74   18.30    0.86   1.41  99.70
sdd               0.00     0.00  392.00  324.00  1568.00 32976.00    96.49     7.82   10.36   18.11    0.98   1.39  99.20
sde               0.00     0.00  371.00  338.00  1484.00 32976.00    97.21     8.09   10.73   19.66    0.94   1.41  99.70

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  343.00   79.00  1372.00  9180.00    50.01     6.92   17.54   20.69    3.89   2.33  98.50
sdb               0.00     0.00  336.00   80.00  1344.00  9100.00    50.21     7.08   18.12   21.46    4.09   2.39  99.40
sdd               0.00     0.00  323.00   83.00  1292.00  9324.00    52.30     8.09   20.86   24.92    5.07   2.45  99.50
sde               0.00     0.00  366.00   87.00  1464.00  9196.00    47.06     7.90   18.43   21.76    4.40   2.21  99.90

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  328.00  218.00  1312.00 24720.00    95.36     8.62   15.75   24.80    2.14   1.83 100.00
sdb               0.00     0.00  296.00  218.00  1184.00 24800.00   101.11     8.71   17.09   27.99    2.29   1.95 100.00
sdd               0.00     0.00  303.00  219.00  1212.00 23164.00    93.39     6.56   12.57   20.45    1.66   1.92 100.00
sde               0.00     0.00  303.00  219.00  1212.00 23292.00    93.89     6.65   12.68   20.32    2.11   1.92 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  294.00  290.00  1176.00 33272.00   117.97     7.52   12.55   23.29    1.67   1.71 100.00
sdb               0.00     0.00  270.00  290.00  1080.00 33272.00   122.69     7.52   12.65   24.33    1.77   1.78  99.90
sdd               0.00     0.00  278.00  285.00  1112.00 33148.00   121.71     7.80   13.75   26.01    1.78   1.77  99.40
sde               0.00     0.00  316.00  284.00  1264.00 33148.00   114.71     7.90   13.10   23.36    1.67   1.66  99.70

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  427.00  219.00  1708.00 24184.00    80.16     7.71   12.19   17.74    1.36   1.55 100.00
sdb               0.00     0.00  378.00  184.00  1512.00 22152.00    84.21     7.84   14.77   21.10    1.77   1.77  99.70
sdd               0.00     0.00  412.00  185.00  1648.00 22308.00    80.25     7.27   12.06   16.95    1.15   1.68 100.00
sde               0.00     0.00  406.00  187.00  1624.00 22496.00    81.35     7.15   12.29   17.27    1.47   1.69 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  323.00   99.00  1292.00 10092.00    53.95     8.78   20.74   25.86    4.04   2.37 100.00
sdb               0.00     0.00  331.00  116.00  1324.00 12124.00    60.17     8.69   19.34   24.87    3.55   2.24 100.00
sdd               0.00     0.00  325.00  109.00  1300.00  9776.00    51.04     7.09   16.68   21.25    3.05   2.30  99.90
sde               0.00     0.00  337.00  107.00  1348.00  9588.00    49.26     6.98   15.66   19.58    3.29   2.25 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  279.00  384.00  1116.00 33700.00   105.03     7.01   10.71   23.47    1.44   1.51  99.90
sdb               0.00     0.00  261.00  365.00  1044.00 33700.00   111.00     6.84   11.06   24.38    1.53   1.59  99.80
sdd               0.00     0.00  270.00  400.00  1080.00 32668.00   100.74     8.17   11.76   26.83    1.59   1.49 100.00
sde               0.00     0.00  300.00  410.00  1200.00 32668.00    95.40     8.36   11.61   25.44    1.50   1.41 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  366.00  284.00  1464.00 32820.00   105.49     6.87   10.51   17.85    1.06   1.53  99.20
sdb               0.00     0.00  358.00  284.00  1432.00 32820.00   106.70     6.92   10.69   18.37    1.02   1.56  99.90
sdd               0.00     0.00  384.00  317.00  1536.00 34520.00   102.87     8.13   11.78   20.66    1.04   1.43 100.00
sde               0.00     0.00  390.00  315.00  1560.00 34520.00   102.35     8.12   11.57   20.13    0.98   1.42 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  361.00   11.00  1444.00   304.00     9.40     8.01   21.62   21.63   21.18   2.69 100.00
sdb               0.00     0.00  336.00   12.00  1344.00   432.00    10.21     8.18   23.37   23.33   24.42   2.87 100.00
sdd               0.00     0.00  370.00   12.00  1480.00   436.00    10.03     7.67   20.48   20.39   23.17   2.62 100.00
sde               0.00     0.00  336.00   12.00  1344.00   436.00    10.23     7.81   22.85   22.71   26.83   2.87 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  272.00  330.00  1088.00 34032.00   116.68     8.87   14.68   29.68    2.32   1.66 100.00
sdb               0.00     0.00  240.00  328.00   960.00 33904.00   122.76     8.78   15.51   33.46    2.38   1.76 100.00
sdd               0.00     0.00  260.00  327.00  1040.00 32452.00   114.11     7.69   12.94   26.49    2.16   1.71 100.10
sde               0.00     0.00  242.00  301.00   968.00 32452.00   123.09     7.58   13.76   28.04    2.29   1.84 100.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  345.00  281.00  1380.00 32592.00   108.54     7.85   11.77   20.45    1.12   1.59  99.50
sdb               0.00     0.00  322.00  282.00  1288.00 32592.00   112.19     7.96   12.50   22.53    1.05   1.66 100.00
sdd               0.00     0.00  339.00  282.00  1356.00 33700.00   112.90     7.37   10.94   19.18    1.03   1.61  99.70
sde               0.00     0.00  315.00  283.00  1260.00 33700.00   116.92     7.50   11.70   21.23    1.08   1.67  99.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  406.00   94.00  1624.00 10828.00    49.81     7.20   15.51   18.32    3.36   1.99  99.60
sdb               0.00     0.00  364.00   87.00  1456.00  9936.00    50.52     7.36   17.25   20.46    3.82   2.21  99.70
sdd               0.00     0.00  365.00   80.00  1460.00  9080.00    47.37     8.47   20.05   23.52    4.22   2.25 100.00
sde               0.00     0.00  421.00   95.00  1684.00 10984.00    49.10     8.29   17.14   20.21    3.52   1.94 100.00

It's ok, all operations spread equally through physical disks. Results almost same, because ARC<=2Gb, but test volume=10GB Read bw=5647.5KB/s Write bw=8727.3KB/s

drumroll...... results for ZoL 0.7.0-rc1

4kRead: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
4kWrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=120
fio-2.2.8
Starting 2 processes
Jobs: 1 (f=1): [r(1),_(1)] [50.5% done] [128KB/0KB/0KB /s] [32/0/0 iops] [eta 04m:57s]
4kRead: (groupid=0, jobs=1): err= 0: pid=12908: Thu Oct  6 11:57:21 2016
  read : io=604228KB, bw=1995.3KB/s, iops=498, runt=302838msec
    slat (usec): min=15, max=7429.4K, avg=238975.31, stdev=607098.19
    clat (usec): min=0, max=3169, avg= 5.51, stdev=11.74
     lat (usec): min=18, max=7429.5K, avg=238984.52, stdev=607099.07
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    2], 10.00th=[    2], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    4], 50.00th=[    4], 60.00th=[    5],
     | 70.00th=[    6], 80.00th=[    7], 90.00th=[    9], 95.00th=[   11],
     | 99.00th=[   28], 99.50th=[   37], 99.90th=[   84], 99.95th=[  131],
     | 99.99th=[  310]
    bw (KB  /s): min=    9, max= 6416, per=100.00%, avg=2042.75, stdev=1810.27
    lat (usec) : 2=1.90%, 4=32.04%, 10=58.42%, 20=5.81%, 50=1.57%
    lat (usec) : 100=0.19%, 250=0.05%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 4=0.01%
  cpu          : usr=0.31%, sys=0.50%, ctx=136428, majf=0, minf=18
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=151057/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=120
4kWrite: (groupid=0, jobs=1): err= 0: pid=12909: Thu Oct  6 11:57:21 2016
  write: io=2481.3MB, bw=8466.2KB/s, iops=2116, runt=300113msec
    slat (usec): min=40, max=4683.1K, avg=55860.01, stdev=35143.44
    clat (usec): min=0, max=6942, avg= 4.62, stdev=16.57
     lat (usec): min=42, max=4683.1K, avg=55868.15, stdev=35144.44
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    2], 10.00th=[    2], 20.00th=[    2],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    4], 60.00th=[    4],
     | 70.00th=[    5], 80.00th=[    5], 90.00th=[    6], 95.00th=[    8],
     | 99.00th=[   27], 99.50th=[   38], 99.90th=[   71], 99.95th=[   99],
     | 99.99th=[  326]
    bw (KB  /s): min= 3497, max=75872, per=100.00%, avg=8474.47, stdev=6054.31
    lat (usec) : 2=4.63%, 4=39.85%, 10=52.47%, 20=1.42%, 50=1.37%
    lat (usec) : 100=0.21%, 250=0.04%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=1.34%, sys=1.98%, ctx=434753, majf=0, minf=15
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=635198/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=120

Run status group 0 (all jobs):
   READ: io=604228KB, aggrb=1995KB/s, minb=1995KB/s, maxb=1995KB/s, mint=302838msec, maxt=302838msec
  WRITE: io=2481.3MB, aggrb=8466KB/s, minb=8466KB/s, maxb=8466KB/s, mint=300113msec, maxt=300113msec

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00   59.00    0.00   236.00     0.00     8.00     0.30    5.22    5.22    0.00   4.58  27.00
sdd               0.00     0.00   59.00    0.00   236.00     0.00     8.00     0.28    4.78    4.78    0.00   4.41  26.00
sde               0.00     0.00   56.00 2004.00   224.00 10448.00    10.36    19.65   11.27  242.95    4.80   0.49 100.00
sdf               0.00     0.00   53.00 1988.00   212.00 15896.00    15.78    19.75   10.38  214.91    4.92   0.49 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  127.00    0.00   508.00     0.00     8.00     0.82    6.44    6.44    0.00   4.87  61.80
sdd               0.00     0.00  130.00    0.00   520.00     0.00     8.00     0.77    5.94    5.94    0.00   4.54  59.00
sde               0.00     0.00  101.00 2674.00   416.00 14108.00    10.47    19.31    6.94   97.66    3.51   0.36 100.00
sdf               0.00     0.00  153.00 3482.00   612.00 27856.00    15.66    19.24    5.24   62.90    2.70   0.28 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  111.00    0.00   444.00     0.00     8.00     0.59    5.36    5.36    0.00   4.41  49.00
sdd               0.00     0.00  120.00    0.00   480.00     0.00     8.00     0.69    5.73    5.73    0.00   4.53  54.40
sde               0.00     0.00   94.00 2837.00   376.00 15116.00    10.57    19.48    6.85  112.80    3.34   0.34 100.00
sdf               0.00     0.00  138.00 3204.00   552.00 25632.00    15.67    19.38    5.84   72.33    2.98   0.30 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  244.00    0.00   976.00     0.00     8.00     2.51   10.22   10.22    0.00   3.71  90.50
sdd               0.00     0.00  254.00    0.00  1016.00     0.00     8.00     2.52    9.85    9.85    0.00   3.56  90.50
sde               0.00     0.00   37.00 1542.00   148.00 11876.00    15.23    19.80   10.52  185.76    6.32   0.63 100.10
sdf               0.00     0.00  467.00  335.00  1868.00  2680.00    11.34    11.08   14.47   22.46    3.35   1.25 100.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  290.00    0.00  1160.00     0.00     8.00     3.40   11.74   11.74    0.00   3.40  98.60
sdd               0.00     0.00  285.00    0.00  1140.00     0.00     8.00     3.36   11.84   11.84    0.00   3.47  99.00
sde               0.00     0.00   39.00 1645.00   156.00 13152.00    15.81    19.79   13.16  317.03    5.96   0.59  99.90
sdf               0.00     0.00  548.00    0.00  2192.00     0.00     8.00     9.96   18.00   18.00    0.00   1.82  99.90

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  278.00    0.00  1112.00     0.00     8.00     3.26   11.60   11.60    0.00   3.51  97.50
sdd               0.00     0.00  310.00    0.00  1240.00     0.00     8.00     3.26   10.46   10.46    0.00   3.17  98.20
sde               0.00     0.00   84.00 2978.00   336.00 23824.00    15.78    13.34    5.04   68.63    3.25   0.33 100.00
sdf               0.00     0.00  490.00    0.00  1972.00     0.00     8.05     9.97   20.19   20.19    0.00   2.04 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  315.00    0.00  1260.00     0.00     8.00     3.60   11.43   11.43    0.00   3.14  98.80
sdd               0.00     0.00  308.00    0.00  1232.00     0.00     8.00     3.60   11.49   11.49    0.00   3.20  98.70
sde               0.00     0.00   69.00 3166.00   276.00 25328.00    15.83    10.45    3.22   13.67    2.99   0.31 100.00
sdf               0.00     0.00  550.00    0.00  2200.00     0.00     8.00     9.98   18.45   18.45    0.00   1.82 100.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  296.00  723.00  1184.00  9320.00    20.62     5.39    5.33   15.53    1.16   0.95  96.70
sdd               0.00     0.00  291.00  746.00  1164.00  9312.00    20.20     5.35    5.25   16.29    0.94   0.95  98.10
sde               0.00     0.00  164.00 1318.00   656.00 10604.00    15.20    11.30    6.88   29.84    4.03   0.67  98.70
sdf               0.00     0.00  366.00  504.00  1464.00  3552.00    11.53    11.16   11.81   23.59    3.26   1.13  98.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00    7.00 3891.00    28.00 53396.00    27.41     6.79    1.73   11.43    1.71   0.23  88.90
sdd               0.00     0.00   12.00 3756.00    48.00 52484.00    27.88     7.10    1.88   12.58    1.85   0.24  91.60
sde               0.00     0.00    5.00  883.00    20.00  7064.00    15.95    19.40   15.53  893.60   10.56   1.13 100.00
sdf               0.00     0.00   12.00 1344.00    48.00 10128.00    15.01    19.28   10.05  362.83    6.90   0.74 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00    7.00 4529.00    28.00 48168.00    21.25     8.09    1.80   12.71    1.78   0.21  97.30
sdd               0.00     0.00   11.00 4464.00    44.00 46976.00    21.01     8.02    1.78    9.09    1.76   0.21  96.00
sde               0.00     0.00    9.00  871.00    36.00  6968.00    15.92    19.57   25.00 1371.44   11.09   1.14 100.00
sdf               0.00     0.00   10.00 1302.00    40.00 10416.00    15.94    19.20   13.83  892.50    7.08   0.76 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00    8.00 4445.00    32.00 48828.00    21.94     8.20    1.83   10.88    1.81   0.22  97.20
sdd               0.00     0.00    5.00 4807.00    20.00 55040.00    22.88     7.83    1.63    9.60    1.63   0.20  97.20
sde               0.00     0.00    3.00  981.00    12.00  7848.00    15.98    19.60   13.96 1421.33    9.65   1.02 100.10
sdf               0.00     0.00    4.00 1228.00    16.00  9824.00    15.97    19.49   13.47 1794.75    7.67   0.81 100.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00    9.00 1718.00    36.00 46556.00    53.96     4.96    2.90    4.67    2.89   0.35  60.50
sdd               0.00     0.00   11.00 1882.00    44.00 42456.00    44.90     4.21    2.24    8.82    2.20   0.32  59.70
sde               0.00     0.00    9.00  737.00    36.00  5896.00    15.90    19.80   35.32 1823.56   13.48   1.34 100.00
sdf               0.00     0.00   11.00 1112.00    44.00  8896.00    15.92    19.71   20.53 1210.18    8.76   0.89 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00    1.00    0.00     4.00     0.00     8.00     0.00    4.00    4.00    0.00   4.00   0.40
sdd               0.00     0.00    5.00    0.00    20.00     0.00     8.00     0.02    3.60    3.60    0.00   3.60   1.80
sde               0.00     0.00    4.00  778.00    16.00  6224.00    15.96    19.94   15.94  652.50   12.67   1.28 100.00
sdf               0.00     0.00    6.00 1190.00    24.00  9520.00    15.96    19.94   15.51 1432.00    8.37   0.84 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00    6.00    0.00    24.00     0.00     8.00     0.03    4.83    4.83    0.00   4.83   2.90
sdd               0.00     0.00   14.00    0.00    56.00     0.00     8.00     0.06    4.07    4.07    0.00   4.07   5.70
sde               0.00     0.00    7.00  728.00    28.00  5824.00    15.92    19.96   30.30 1753.57   13.73   1.36 100.00
sdf               0.00     0.00    8.00 1430.00    32.00 11440.00    15.96    19.92   13.74 1230.88    6.94   0.70 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00   22.00    0.00    88.00     0.00     8.00     0.09    4.09    4.09    0.00   4.23   9.30
sdd               0.00     0.00   14.00    0.00    56.00     0.00     8.00     0.06    4.07    4.07    0.00   4.07   5.70
sde               0.00     0.00    7.00  860.00    28.00  6880.00    15.94    19.94   21.84 1275.14   11.64   1.15 100.00
sdf               0.00     0.00   33.00 1860.00   132.00 10684.00    11.43    19.87   13.87  497.27    5.29   0.53 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00   20.00    0.00    80.00     0.00     8.00     0.10    4.90    4.90    0.00   4.50   9.00
sdd               0.00     0.00   25.00    0.00   100.00     0.00     8.00     0.12    4.92    4.92    0.00   4.88  12.20
sde               0.00     0.00    9.00  931.00    36.00  7448.00    15.92    19.90   22.05 1220.56   10.47   1.06 100.00
sdf               0.00     0.00   45.00 2731.00   180.00 13516.00     9.87    19.78    7.43  239.67    3.60   0.36 100.00

Yes, that's it - one pair of disks are overloaded, and we see slightly different avgrq-sz. So, results twice worse, than without data caching: Read bw=1995.3KB/s Write bw=8466.2KB/s

redtex commented 8 years ago

4880

dweeezil commented 8 years ago

@redtex Thanks. I'll get this set up on my test rig today.

dweeezil commented 8 years ago

@redtex Does this issue happen if the same test is run on the VM host?

dweeezil commented 8 years ago

@redtex To clarify further, you are running zfs in the guest, correct? I don't have a CentOS guest handy at the moment so will be running this in a Ubuntu 14.04 guest with a 3.19 kernel initially.

redtex commented 8 years ago

Yes, I'm running this tests on CentOS 7 VM. But absolutely same behaviour presents on pure hardware. I'll check it on Fedora 24 with 4.7 series kernel

dweeezil commented 8 years ago

@redtex I just ran my first 2 tests: one with primarycache=metadata and the other with primarycache=all and didn't see much difference.

primarycache=metadata

4kRead: (groupid=0, jobs=1): err= 0: pid=19699: Thu Oct  6 11:53:10 2016
  read : io=803612KB, bw=2674.6KB/s, iops=668, runt=300464msec
4kWrite: (groupid=0, jobs=1): err= 0: pid=19700: Thu Oct  6 11:53:10 2016
  write: io=2952.9MB, bw=10035KB/s, iops=2508, runt=301309msec

and

primarycache=all

4kRead: (groupid=0, jobs=1): err= 0: pid=24290: Thu Oct  6 12:04:24 2016
  read : io=911856KB, bw=3034.1KB/s, iops=758, runt=300451msec
4kWrite: (groupid=0, jobs=1): err= 0: pid=24291: Thu Oct  6 12:04:24 2016
  write: io=2763.6MB, bw=9431.4KB/s, iops=2357, runt=300045msec

The iostat didn't show anything terribly weird. When run with 1 second interval, the numbers were pretty much all over the place. This is with current master so I'll be trying next with an actual 0.7.0-rc1 since that doesn't have the highly re-worked ARC code due to the compressed ARC.

I may just run these tests on the host now if that makes no difference. For your VM guest, however, I was wondering if you used cache=none for your virtio-scsi disks (I did use it). Also, regarding the raw numbers shown above, the system I'm testing on has pretty ordinary hard drives (but it has a lot of them). I configured the pool exactly as you did for this test.

redtex commented 8 years ago

Well.. my fio test run on 4Gb RAM virtual machine, i.e. ARC has 2Gb. So with this setup ARC fills up for 3 minutes. Until ARC not fully filled - the iostat did not shows anything unusual. Maybe, my disks (yes they has cache=none option) too fast, maybe you has to run tests some longer, than 5min, which is duration of my fio job.

redtex commented 8 years ago

Here is my results from Fedora 24, kernel 4.7.5-200.fc24.x86_64

primarycache=metadata

4kRead: (groupid=0, jobs=1): err= 0: pid=28808: Thu Oct  6 23:46:17 2016
  read : io=902792KB, bw=2997.5KB/s, iops=749, runt=301188msec
4kWrite: (groupid=0, jobs=1): err= 0: pid=28809: Thu Oct  6 23:46:17 2016
  write: io=6728.4MB, bw=22934KB/s, iops=5733, runt=300418msec

and

4kRead: (groupid=0, jobs=1): err= 0: pid=16902: Fri Oct  7 00:12:14 2016
  read : io=787268KB, bw=2605.7KB/s, iops=651, runt=302138msec
4kWrite: (groupid=0, jobs=1): err= 0: pid=16903: Fri Oct  7 00:12:14 2016
  write: io=4953.6MB, bw=16903KB/s, iops=4225, runt=300095msec

The uneven IO is present, but more rarely, than with CentOS 7 kernel 3.10 But overall read performance almost twice worse, than with CentOS 7 kernel 3.10

Of course, disks the same - actually, it's the same pool from CentOS tests, connected to Fedora 24 VM.

igsol commented 7 years ago

Hi,

I'm sorry if below info isn't relative, but at first glance the observed situation on FreeBSD looks similar with the current issue. So, one message with found solution of uneven load: https://lists.freebsd.org/pipermail/freebsd-fs/2016-December/024178.html . Please, put a look on whole messaging thread. The fix: https://svnweb.freebsd.org/base?view=revision&revision=309714 As I can see ZoL (zio_timestamp_compare in https://github.com/zfsonlinux/zfs/blob/master/module/zfs/zio.c) could suffer from the same issue which was fixed in FreeBSD. Could somebody more competent in ZoL internals put a look and make "back-port" to ZoL if needed? Let's make ZoL not worse than ZFS on FreeBSD :) Thanks.

gmelikov commented 7 years ago

@igsol looks like you point to https://www.illumos.org/issues/7090 , which is already ported in https://github.com/zfsonlinux/zfs/commit/3dfb57a to master branch.

EDIT my bad, mixed up, it's the different commit and not ported to ZoL.

behlendorf commented 7 years ago

@igsol thanks for pointing this out. We should adapt the fix from FreeBSD and see how it impacts performance. However, I don't see how it could be the root cause of this exact issue. The problematic function was only first enable by default in 0.7.0-rc3 and this issue predates that.

igsol commented 7 years ago

The problematic function was only first enable by default in 0.7.0-rc3

Sure, you are right. In any case I am glad that the suspicious comparison fixed in FreeBSD will get attention of right people in ZoL.

redtex commented 6 years ago

Upgraded production system from 0.6.4.2 to 0.7.6 Issue is gone