openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.43k stars 1.73k forks source link

Massive read IO on pure write #5447

Closed poige closed 7 years ago

poige commented 7 years ago

vmstat (megabytes units):

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0    800 120111    632    0    0  2171    52 9641 43733  0  4 95  0  0
 2  0      0    789 120122    632    0    0   599     0 9711 43799  0  6 94  0  0
 0  0      0    784 120127    632    0    0   368     0 6871 31201  0  3 96  0  0
 0  0      0    808 120103    632    0    0  1586    28 10195 46414  1  4 96  0  0
 1  0      0    799 120114    632    0    0   420     0 9205 41567  1  4 95  0  0
 1  0      0    795 120119    632    0    0   386     0 6541 30090  0  3 97  0  0
 1  0      0    792 120066    632    0    0 534970 176976 28567 68770  1 39 60  0  0
21  1      0    812 120042    632    0    0 1040773 225508 41753 69471  0 63 36  1  0
21  1      0    819 120047    632    0    0 1078592 258308 47670 82460  1 68 31  1  0
33  1      0    810 120058    632    0    0 1005650 238588 43574 75541  1 64 35  1  0                                                                                     31  1      0    773 120064    632    0    0 988607 256332 42979 76219  0 63 36  1  0                                                                                      21  0      0    813 120040    632    0    0 1084765 250008 52879 91251  1 69 30  1  0                                                                                     29  1      0    802 120074    632    0    0 926076 251724 40399 85557  1 61 38  1  0                                                                                      20  1      0    792 120050    632    0    0 981072 260960 48630 85652  1 64 34  1  0
10  1      0    796 120056    632    0    0 1075612 247248 49633 81476  1 67 32  1  0
 7  1      0    814 120032    632    0    0 1027536 252208 46901 81169  1 66 33  1  0
15  1      0    771 120072    632    0    0 1043216 252572 49749 83544  1 67 32  1  0
21  1      0    786 120048    632    0    0 1067744 252456 44296 74687  1 64 35  1  0
 1  0      0    865 120025    632    0    0 407752 63096 21019 45623  1 26 73  1  0
 1  0      0    827 120064    632    0    0     0     0 12530 57066  1  5 94  0  0

This is being observed on a system with zvol disk created and being written to with simply sudo dd bs=64M iflag=fullblock (input data coming over network with netcat).

Kernel version: 4.4.35 (vanilla) SPL && ZFS: ZFS: Loaded module v0.6.5.8-1, ZFS pool version 5000, ZFS filesystem version 5

poige commented 7 years ago

According to zpool iostat, allegedly there's no reading at all though:

% sudo zpool iostat 1
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
T           17.2T   103T     10  13.7K  41.5K  55.3M
T           17.2T   103T      0  63.5K      0   244M
T           17.2T   103T      0  59.8K      0   230M
T           17.2T   103T      0  59.1K      0   227M
T           17.2T   103T      0  66.4K      0   258M
T           17.2T   103T      0  61.6K      0   239M
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0  11.2K      0  43.3M                                                                                                                      T           17.2T   103T      0  64.5K      0   250M                                                                                                                      T           17.2T   103T      0  61.3K      0   239M                                                                                                                      T           17.2T   103T      0  58.4K      0   209M                                                                                                                      T           17.2T   103T      0  10.3K      0  27.4M
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0  48.6K      0   189M
T           17.2T   103T      0  55.8K      0   215M
T           17.2T   103T      0  63.0K      0   245M
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0  3.67K      0  13.4M
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0
T           17.2T   103T      0      0      0      0

But iotop -a has another view:

Total DISK READ :     212.46 K/s | Total DISK WRITE :      61.53 M/s
Actual DISK READ:     212.46 K/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER    DISK READ>  DISK WRITE  SWAPIN      IO    COMMAND
12179 be/0 root        525.74 M      0.00 B  0.00 %  0.00 % [z_wr_int_1]
12199 be/0 root        525.20 M      0.00 B  0.00 %  0.00 % [z_wr_int_3]
12244 be/0 root        524.84 M      0.00 B  0.00 %  0.00 % [z_wr_int_7]
12228 be/0 root        524.45 M      0.00 B  0.00 %  0.00 % [z_wr_int_6]
12206 be/0 root        524.33 M      0.00 B  0.00 %  0.00 % [z_wr_int_4]
12231 be/0 root        524.10 M      0.00 B  0.00 %  0.00 % [z_wr_int_6]
12222 be/0 root        523.79 M      0.00 B  0.00 %  0.00 % [z_wr_int_5]
12229 be/0 root        522.82 M      0.00 B  0.00 %  0.00 % [z_wr_int_6]
12235 be/0 root        522.74 M      0.00 B  0.00 %  0.00 % [z_wr_int_6]
12250 be/0 root        520.51 M      0.00 B  0.00 %  0.00 % [z_wr_int_7]
12232 be/0 root        520.40 M      0.00 B  0.00 %  0.00 % [z_wr_int_6]
12177 be/0 root        520.27 M      0.00 B  0.00 %  0.00 % [z_wr_int_1]
12160 be/0 root        520.21 M      0.00 B  0.00 %  0.00 % [z_wr_int_0]
12163 be/0 root        519.41 M      0.00 B  0.00 %  0.00 % [z_wr_int_0]
12174 be/0 root        518.81 M      0.00 B  0.00 %  0.00 % [z_wr_int_1]
12208 be/0 root        518.59 M      0.00 B  0.00 %  0.00 % [z_wr_int_4]
12230 be/0 root        518.58 M      0.00 B  0.00 %  0.00 % [z_wr_int_6]
12212 be/0 root        518.17 M      0.00 B  0.00 %  0.00 % [z_wr_int_4]
12233 be/0 root        517.86 M      0.00 B  0.00 %  0.00 % [z_wr_int_6]
12236 be/0 root        517.75 M      0.00 B  0.00 %  0.00 % [z_wr_int_6]
12249 be/0 root        517.73 M      0.00 B  0.00 %  0.00 % [z_wr_int_7]
12185 be/0 root        517.36 M      0.00 B  0.00 %  0.00 % [z_wr_int_2]
12245 be/0 root        517.34 M      0.00 B  0.00 %  0.00 % [z_wr_int_7]
12192 be/0 root        516.77 M      0.00 B  0.00 %  0.00 % [z_wr_int_3]
12176 be/0 root        516.69 M      0.00 B  0.00 %  0.00 % [z_wr_int_1]
12190 be/0 root        516.57 M      0.00 B  0.00 %  0.00 % [z_wr_int_2]
12171 be/0 root        516.54 M      0.00 B  0.00 %  0.00 % [z_wr_int_1]
12198 be/0 root        516.13 M      0.00 B  0.00 %  0.00 % [z_wr_int_3]
12204 be/0 root        516.12 M      0.00 B  0.00 %  0.00 % [z_wr_int_4]
12169 be/0 root        516.04 M      0.00 B  0.00 %  0.00 % [z_wr_int_1]
12162 be/0 root        515.96 M      0.00 B  0.00 %  0.00 % [z_wr_int_0]
12225 be/0 root        515.91 M      0.00 B  0.00 %  0.00 % [z_wr_int_5]
12247 be/0 root        515.63 M      0.00 B  0.00 %  0.00 % [z_wr_int_7]
12167 be/0 root        515.02 M      0.00 B  0.00 %  0.00 % [z_wr_int_0]
12161 be/0 root        514.80 M      0.00 B  0.00 %  0.00 % [z_wr_int_0]
12202 be/0 root        514.51 M      0.00 B  0.00 %  0.00 % [z_wr_int_3]
12226 be/0 root        514.36 M      0.00 B  0.00 %  0.00 % [z_wr_int_5]
12240 be/0 root        514.15 M      0.00 B  0.00 %  0.00 % [z_wr_int_7]
12200 be/0 root        514.14 M      0.00 B  0.00 %  0.00 % [z_wr_int_3]
12215 be/0 root        514.05 M      0.00 B  0.00 %  0.00 % [z_wr_int_4]
12219 be/0 root        514.05 M      0.00 B  0.00 %  0.00 % [z_wr_int_5]
12164 be/0 root        513.73 M      0.00 B  0.00 %  0.00 % [z_wr_int_0]
12193 be/0 root        513.59 M      0.00 B  0.00 %  0.00 % [z_wr_int_3]
12248 be/0 root        513.55 M      0.00 B  0.00 %  0.00 % [z_wr_int_7]
12224 be/0 root        513.29 M      0.00 B  0.00 %  0.00 % [z_wr_int_5]
12182 be/0 root        513.02 M      0.00 B  0.00 %  0.00 % [z_wr_int_2]
12183 be/0 root        512.98 M      0.00 B  0.00 %  0.00 % [z_wr_int_2]
12217 be/0 root        512.96 M      0.00 B  0.00 %  0.00 % [z_wr_int_5]
richardelling commented 7 years ago

216KB/sec isn't much of a read workload, especially compared to 60MB/sec write.

Personally, I prefer to measure I/O work using iostat -zx which measures I/Os read/write at both the zvol and disk (sd/nvme) interfaces. You might give that a try.

poige commented 7 years ago

there's vmstat for your pleasure

ronnyegner commented 7 years ago

You said you are using ZVOLs. Maybe the access path is in a way that the kernel sees writes as reads (loopback or so.. just a guess)...

Could you post "iostat -xm" or "iostat -zx" as well please? With that we should be able to see if there is really read I/O going on or not.

From the ZPOOL IOSTAT output above i would say vmstat is wrong...There is no read i/o going on...

poige commented 7 years ago

just a guess

I'd prefer just a thought if you care enough to reply.

ronnyegner commented 7 years ago

Maybe i am getting you wrong (english is not my native language).. but are you really cursing at the people that are trying to help you? Because i am understanding your sentence as "if you just guessing you better not bother to answer".

poige commented 7 years ago

You would be surprised but I did Not ask you to help me. The issue is being reported for developer's attention. What you call "help" is self-flattery, alas.

ghost commented 7 years ago

There was once a freelancer who was having issues with his storage array. So he went and asked for help on the public bug tracker of the open source project he was using for his storage.

And the first member of the open source community came, and said "do you need help? try running this command" to which he replied "no thanks, i have already run another one".

Then another member came, and he also tried to help the freelancer, but he said "no thanks, the Developer will save me".

After a while he lost his storage array, and maybe also the job he was contracted for, because even if he was profiled in the "top 3% overall" on stackexchange he couldn't get the help of the open source community because he was always rude and impolite.

Then, one day, he finally met the Developer. And he said "Developer, why didn't you help me?" and the Developer said "i sent you the open source community, fool!"

The End.

poige commented 7 years ago

Oh, naive people are quite often met throughout the OSS as it's seen.

image

— You can modulate your tone, your tongue — whatever, it doesn't help getting rid of "duplicate long long" improvers. ;) Chao.

poige commented 7 years ago

And if it's not a bug, but expected behaviour, wow, ok. Good!!!11