Closed zielony360 closed 5 years ago
related to https://github.com/zfsonlinux/zfs/issues/3871 Highly uneven IO size during write on Zvol ?
CC: @ryao
@zielony360 are you observing this behavior right from the start upon having booted up the computer, server ?
any special configuration underneath the zvols or zpools ?
cryptsetup ?
@kernelOfTruth Doesn't seem to be related to #3871, because workload is distributed quite equal across the physical disks (I attached iostat with dm-). Regarding #3871 - all are set as rotational.
uptime:
# uptime
14:13:16 up 53 days, 20 min, 1 user, load average: 5,31, 6,79, 6,90
So the problem isn't connected to prefetching metadata to ARC after reboot.
Pool configuration: built on SAS disks in 2 JBODS, using multipath round robin to physical disks. lz4 enabled. Changed zfs parameters:
options zfs zfs_vdev_async_write_active_min_dirty_percent=15
options zfs zfs_vdev_async_write_active_max_dirty_percent=40
options zfs zfs_delay_min_dirty_percent=40
options zfs zfs_metaslab_fragmentation_threshold=80
options zfs zfs_mg_fragmentation_threshold=90
options zfs zfs_delay_scale=4000000
options zfs zio_delay_max=60000
options zfs zfs_top_maxinflight=128
I was observing very high pm% in arcstat, so I disabled prefetching now, but it didin't help.
All zvols have 128k volblocksize, because the operations have large blocks generally (as in this case). Compression ratio from 1.5x to 2.0x. All zvols are sparse.
I also set zfs_vdev_async_read_max_active to 10 - no change.
cryptsetup - no.
There is no problem in writes, which go up to 1 GB/s.
Best regards, Tomasz Charoński
which generally was much greater, but now during copying file from one LUN to the another there is high zvol utilization causing low read throughput:
in reference to what state ?
what did change ?
I am using zfs 0.6.4.2 on kernel 4.0.4
was that supposed to mean
I am using zfs 0.6.5.2 on kernel 4.0.4
or are you really running 0.6.4.2 ?
0.6.5.2 has major improvements related to zvols: https://github.com/zfsonlinux/zfs/releases
Improved zvol performance on all kernels (>50% higher throughput, >20% lower latency)
which could help with your problem
@kernelOfTruth
in reference to what state ?
When I was testing before production period reading with dd with big block size, it was hunderds of GBs.
what did change ?
If it's going about configuration, there is no change.
or are you really running 0.6.4.2 ?
Yes, I am using 0.6.4.2.
0.6.5.2 has major improvements related to zvols: https://github.com/zfsonlinux/zfs/releases
I know, but I saw many kernel panics here at Issues, even on 0.6.5.2, so I am waiting for bugfixes. :] That's production backup system, so I cannot afford crashes.
@zielony360 alright :)
Could you please post output of
cat /proc/meminfo
and
cat /proc/spl/kstat/zfs/arcstats
that perhaps might give some more diagnostic information
If the others don't have any quick idea on how to improve this
the IRC channel might be your best bet - perhaps someone there has encountered similar behavior
Thanks
# cat /proc/meminfo
MemTotal: 132005272 kB
MemFree: 53571392 kB
MemAvailable: 54253168 kB
Buffers: 314672 kB
Cached: 2605332 kB
SwapCached: 0 kB
Active: 2041444 kB
Inactive: 1037044 kB
Active(anon): 1494084 kB
Inactive(anon): 933092 kB
Active(file): 547360 kB
Inactive(file): 103952 kB
Unevictable: 17244 kB
Mlocked: 17244 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 104 kB
Writeback: 0 kB
AnonPages: 175736 kB
Mapped: 99132 kB
Shmem: 2265020 kB
Slab: 32763416 kB
SReclaimable: 368336 kB
SUnreclaim: 32395080 kB
KernelStack: 20256 kB
PageTables: 6088 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 66002636 kB
Committed_AS: 3810368 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 42517616 kB
VmallocChunk: 34221413700 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 347452 kB
DirectMap2M: 8007680 kB
DirectMap1G: 127926272 kB
# cat /proc/spl/kstat/zfs/arcstats
5 1 0x01 86 4128 14764790262 4583395777289310
name type data
hits 4 2061837747
misses 4 931645936
demand_data_hits 4 1639776797
demand_data_misses 4 658114871
demand_metadata_hits 4 354773563
demand_metadata_misses 4 21667444
prefetch_data_hits 4 58288956
prefetch_data_misses 4 248236916
prefetch_metadata_hits 4 8998431
prefetch_metadata_misses 4 3626705
mru_hits 4 963264829
mru_ghost_hits 4 3971979
mfu_hits 4 1033863118
mfu_ghost_hits 4 59056799
deleted 4 1612576134
recycle_miss 4 6110122
mutex_miss 4 329739
evict_skip 4 72342052
evict_l2_cached 4 0
evict_l2_eligible 4 193158135966208
evict_l2_ineligible 4 24420786919424
hash_elements 4 4982568
hash_elements_max 4 4984279
hash_collisions 4 470209837
hash_chains 4 608747
hash_chain_max 4 7
p 4 63335796736
c 4 67586699264
c_min 4 4194304
c_max 4 67586699264
size 4 67584032792
hdr_size 4 1811491320
data_size 4 35206856704
meta_size 4 29807262208
other_size 4 758422560
anon_size 4 114216960
anon_evict_data 4 0
anon_evict_metadata 4 0
mru_size 4 63219375104
mru_evict_data 4 35090857984
mru_evict_metadata 4 27720315904
mru_ghost_size 4 4382785536
mru_ghost_evict_data 4 4382785536
mru_ghost_evict_metadata 4 0
mfu_size 4 1680526848
mfu_evict_data 4 0
mfu_evict_metadata 4 1478018560
mfu_ghost_size 4 63235949568
mfu_ghost_evict_data 4 33753268224
mfu_ghost_evict_metadata 4 29482681344
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_hdr_miss 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 0
l2_cdata_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_asize 4 0
l2_hdr_size 4 0
l2_compress_successes 4 0
l2_compress_zeros 4 0
l2_compress_failures 4 0
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 0
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 1572864
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 32377176088
arc_meta_limit 4 50690024448
arc_meta_max 4 36014735824
@kernelOfTruth Thank you for your being involved. ;-)
@zielony360 sure :)
the next step I would test would be to set primarycache=metadata
, whether that would improve performance is another question ;)
if there's lots of copying ( +/- rsync) involved - disabling it isn't recommended for most cases according to several sources I read (and only in special cases could improve performance or reduce the needed data to be read)
but your box is clearly better outfitted (132GB ECC RAM) than mine (32 GB) and your i/o is also much higher - so other approaches would be needed ...
If lots of data in a "oneshot" way is transferred then there would be two potential "helpers"
https://github.com/Feh/nocache
and
POSIX_FADV_DONTNEED
http://insights.oetiker.ch/linux/fadvise/ , https://tobi.oetiker.ch/patches/ (rsync with --drop-cache
option )
after that I'm out of ideas :P (well at least for now)
@kernelOfTruth
the next step I would test would be to set `primarycache=metadata , whether that would improve performance is another question ;)
Are you sure caching only metadata would be good, looking at ARC statistics? We can see there is a lot of data hits, more than metadata. We will lose data caching (we don't have L2ARC).
Unfortunately, we can't use that solutions, because zvols are used by Windows machines. Normally Veeam is working there, but now we are migrating some backups from one LUN to the another at NTFS level (basic copy/paste). Veeam has also bad read performance while restoring VMs, so it can be connected.
@zielony360
Are you sure caching only metadata would be good, looking at ARC statistics? We can see there is a lot of data hits, more than metadata. We will lose data caching (we don't have L2ARC).
whether that would improve performance is another question ;)
no - that's just what I would do to gain experience how ZFS behaves (obviously I've not that much experience with operation and internals yet; also this is a home desktop, power user workstation - so the after-affects would be negligible - servers of course have much higher requirements of reliability and consistency in operation),
therefore that pointer to IRC
chain of thoughts were that the issues could be related to zvols (-> 0.6.5.2 ? pre-0.6.5.y ?), metadata (caching ?, rsync ? ARC fighting with pagecache ?) and potentially others (now that you're mentionig it again: NTFS ?)
If you're using NTFS and ntfs-3g driver (fuse) make sure to look into the big_writes , max_readahead and other options
sorry when potentially pointing you in a wrong direction
You probably are seeing read-modify-write overhead on zvols where NTFS uses a 4KB default cluster size while ZFS zvols default to volrecordsize=8K. Either use zvols created with volrecordsize=4K or format NTFS to use a cluster size of 8KB and performance should improve. The volrecordsize=8K default is an artifact from SPARC where the memory page is 8KB.
@ryao
You probably are seeing read-modify-write overhead on zvols where NTFS uses a 4KB default cluster size while ZFS zvols default to volrecordsize=8K. Use zvols created with volrecordsize=4K and performance should improve.
No. Look at iostat in my first post. One LUN is only reading and the second one only writing - simple copy/paste. It uses 512 kB IO block (avgrq-sz / 2). So the block size is not a case here.
@kernelOfTruth
and potentially others (now that you're mentionig it again: NTFS ?) If you're using NTFS and ntfs-3g driver (fuse) make sure to look into the big_writes , max_readahead and other options
Yes, NTFS on Windows 2012 through FC target (I mentioned it already), no on Linux.
There is no rsync anywhere. Simple Windows copy/paste in this case.
@kernelOfTruth, @ryao The case is, IMO, in ZFS stack, because zvol utilization is almost 100%, but disks utilization is about 50%. CPU is also not a case - 2 x 10-core E5-2690 v2 @ 3.00GHz, so 20 cores, 40 threads. Disks splitted in 2 JBODS (each has 2 SAS expanders inside), 2 HBAs, 4 6Gb/s SAS cables.
@zielony360 I suggest running perf top
on this system (the right way to do what I have in mind is to do perf record -F 997 -ag
and generate a flame graph, but perf top
should be sufficient). You should see the majority of time is being spent in spinlocks. Specifically, ->zv_lock
, but perf top
won't show that much detail. The time spent in spinlocks is due to zvols implemented in older versions of ZoL by a spinlock protected request queue like physical hardware uses. This has the effect of hurting latency and throughput. The more concurrent a system is, the worst this effect becomes. This effect should be far worse on a 40 thread system than it was on my 8 thread development system.
You could try backporting the appropriate commits to 0.6.4.2 to deal with this:
https://github.com/zfsonlinux/zfs/commit/37f9dac592bf5889c3efb305c48ac39b4c7dd140 https://github.com/zfsonlinux/zfs/commit/fa56567630cfad95b77dee507595d77f24e99cb9 https://github.com/zfsonlinux/zfs/commit/8198d18ca7e05ec30139e10d658a8b06c0ec4c55 https://github.com/zfsonlinux/zfs/commit/f52ebcb3eb1630694648237ad6d439c226c3dd99 (very important)
The first does not apply cleanly to 0.6.4, but the conflicts are trivial to handle. I took the liberty of doing that here:
@zielony360 Did you try the backports?
@ryao @kernelOfTruth Thank you for your effort. Few days ago I upgraded to 0.6.5.7. The latency on zvols is similiar as previously, however overall performance is better. But read latency is still about 50-100 ms during operations on each zvol. I believe it can be related to high metadata miss:
hits 4 135093787
misses 4 68967196
demand_data_hits 4 130136993
demand_data_misses 4 60022622
demand_metadata_hits 4 4956794
demand_metadata_misses 4 8944561
So a little above 50% hit rate of metadata. This is not the problem of lacking memory:
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 1239474008
arc_meta_limit 4 50690024448
arc_meta_max 4 2451519088
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 2112081920
I think metadata is being evicted too much (zfs_arc_meta_strategy is 1). Is there a way to tune this, so in the effect I will have better metadata hit rate? primarycache=metadata is the last option for me, because as you can see above, I have 68% data hit from ARC. I do not have L2ARC nor SLOG.
Currently tuned ZFS parameters:
options zfs zfs_vdev_async_write_active_min_dirty_percent=15
options zfs zfs_vdev_async_write_active_max_dirty_percent=40
options zfs zfs_delay_min_dirty_percent=40
options zfs zfs_metaslab_fragmentation_threshold=80
options zfs zfs_mg_fragmentation_threshold=90
options zfs zfs_delay_scale=4000000
options zfs zio_delay_max=60000
options zfs zfs_prefetch_disable=1
# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
backup 136T 69,6T 66,6T - 35% 51% 1.00x ONLINE -
Compression lz4 ratio 1,5x on pool. We are using only zvols through FC target.
@zielony360 Looking at your initial iostat output, I'm curious as to how you managed to get avgrq-sz of 1024 (512KiB). Normally it will be capped on the block device's maximum readahead size which defaults to 128KiB. Did you do a blockdev --setra
on your zvols? What do you see in, for example, cat /sys/block/zd16/queue/read_ahead_kb
. I'm asking because I wonder if the large request size could be contributing to the latency.
@dweeezil I haven't changed readahead. But is read_ahead_kb this parameter which decides about max IO size? I didn't think that too big size, which will be divided later, will be bad for performance.
:/sys/block/zd32/queue# for i in *; do echo -n "$i: "; cat $i; done
add_random: 0
discard_granularity: 131072
discard_max_bytes: 2147483648
discard_zeroes_data: 0
hw_sector_size: 512
iostats: 0
logical_block_size: 512
max_hw_sectors_kb: 16384
max_integrity_segments: 0
max_sectors_kb: 16384
max_segments: 65535
max_segment_size: 4294967295
minimum_io_size: 131072
nomerges: 0
nr_requests: 128
optimal_io_size: 131072
physical_block_size: 131072
read_ahead_kb: 128
rotational: 0
rq_affinity: 0
scheduler: none
write_same_max_bytes: 0
targetcli attributes of all zvols:
block_size 512
emulate_3pc yes
emulate_caw yes
emulate_dpo no
emulate_fua_read no
emulate_fua_write yes
emulate_model_alias no
emulate_rest_reord no
emulate_tas yes
emulate_tpu yes
emulate_tpws no
emulate_ua_intlck_ctrl no
emulate_write_cache no
enforce_pr_isids yes
fabric_max_sectors 8192
force_pr_aptpl 0
is_nonrot yes
max_unmap_block_desc_count 1
max_unmap_lba_count 65535
max_write_same_len 65535
optimal_sectors 20480
pi_prot_format 0
pi_prot_type 0
queue_depth 16
unmap_granularity 256
unmap_granularity_alignment 0
}
@zielony360 What's your storage stack like? In particular, what's actually using the zvols? You mentioned NTFS. Is it ntfs-3g? Or something else?
@zielony360 Oh, I think I see. You're exporting the zvols as iSCSI targets. Is the NTFS access from regular Windows clients?
@dweeezil No, we are exporting zvols through Fiber Channel to Windows, which has its NTFS on it. It is used for Veeam backups. So generally large (about 512 kB/s) IOs, 70% async writes, 30% reads.
@zielony360 Got it. Do you see the high read latency for read-only workloads? Or does it require a mix of reads and writes?
@dweeezil Currently we are copying file on NTFS from one LUN to the another. Then we will use this to read latency test, but look at it now:
avg-cpu: %user %nice %system %iowait %steal %idle
0,18 0,00 1,39 3,31 0,00 95,12
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
zd16 0,00 0,00 112,28 0,00 56,03 0,00 1022,02 4,94 43,95 43,95 0,00 8,36 93,89
zd32 0,00 0,00 0,00 111,30 0,00 55,24 1016,42 0,26 2,30 0,00 2,30 0,95 10,55
Those are the only IOs at the moment on pool. High latency on reads, low latency on writes.
arcstat:
time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
13:51:27 1.0K 648 63 648 63 0 0 70 33 63G 62G
14:01:27 1.1K 667 60 667 60 0 0 75 35 62G 62G
14:11:27 1.1K 775 72 775 72 0 0 42 34 62G 62G
14:21:28 862 660 76 660 76 0 0 12 39 62G 62G
14:31:28 600 475 79 475 79 0 0 9 38 62G 62G
14:41:28 611 485 79 485 79 0 0 9 37 63G 62G
14:51:28 646 455 70 455 70 0 0 22 50 62G 62G
14:57:20 346 241 69 241 69 0 0 14 55 62G 62G
14:57:20 0 0 100 0 100 0 0 0 0 62G 62G
15:02:42 295 235 79 235 79 0 0 9 52 63G 62G
15:02:42 0 0 100 0 100 0 0 0 0 63G 62G
You can see metadata miss is generally significant.
arcstats fragment:
arc_prune 4 0
arc_meta_used 4 848598744
arc_meta_limit 4 50690024448
arc_meta_max 4 2752493288
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 2112081920
@zielony360 Ugh, the perils of sticking my nose into a bunch of similar issues at once: I had totally overlooked the fact that your iostats were on the zvol devices and not on the underlying vdev disks. Your earlier arcstats showed a whole lot more metadata being cached:
data_size 4 35206856704 meta_size 4 29807262208 other_size 4 758422560
I presume the last bit of output (the one with arc_meta_used 848598744
) you posted is from more shortly after a reboot? Do you know if the balance between data_size
and meta_size
was different under 0.6.4.2? Possibly caching a lot more metadata?
@dweeezil Yeah, iostats with zd* devices are zvols. The thing is that zvols have 100% utilization during reads, but physical disks - not. So due to this I think metadata misses may be a reason.
"meta_size 4 29807262208" is from 10 months ago, when we had 0.6.4.2. Last arcstats, from this month, are after upgrading to 0.6.5.7 and yes, there was a reboot during this, but it was over a week ago. From this time there was a plenty of daily backup cycles, so metadata should already resist in ARC, but it isn't (hit rate below 50%). In general I can say that 0.6.4.2 had better metadata feeding/not evicting policy than 0.6.5.7. As you can see in earlier posts, that was about 90% metadata hit rate comparing to today below 50%.
Overall performance is better on 0.6.5.7, but I believe that if we have similiar metadata hit rate to 0.6.4.2, it would be even much better.
One more thing. Please remember I disabled prefetch shortly before upgrade. Metadata hit rate coming from prefetch was good, but data - not. On 0.6.4.2 disabling imoroved performance a bit. Should I try to enable it now on 0.6.5.7?
I have also experience on Nexenta ZFS and it tells me it would be great to distinguish prefetch_disable parameter to data_prefetch_disable and metadata_prefetch_disable. I would leave metadata prefetch enabled and disable data prefetch.
@dweeezil @kernelOfTruth @ryao I was making some tests if metadata miss can cause high read latency. So I added one 200 GB SSD to the pool as a cache and set primarycache=metadata. As expected matadata miss from about 60% to <10%, but such setting made things worse: higher latencies. So I set primarycache back to "all" and performance came back to previous levels. L2ARC hit rate was and is very low: <10%.
What is strange, that 3 TB 7200 RPM disks are still not too much utilized, while zvols - are 100% busy (iostat). Example output about HDD disks:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-0 3,30 0,00 31,10 72,60 0,16 1,59 34,41 0,79 7,60 12,77 5,39 2,18 22,56
dm-1 3,70 0,00 25,50 56,00 0,12 1,52 41,33 0,74 9,08 12,36 7,59 2,59 21,08
dm-10 1,50 0,00 30,10 74,40 0,15 1,58 33,94 0,85 8,10 13,55 5,90 2,45 25,56
dm-11 3,80 0,00 29,80 217,30 0,15 1,47 13,44 3,53 13,12 46,81 8,50 2,48 61,28
dm-13 3,50 0,00 24,80 58,90 0,11 1,52 40,08 0,79 9,41 13,06 7,88 2,68 22,44
dm-14 3,60 0,00 24,70 53,10 0,12 1,52 43,16 0,65 8,38 11,68 6,85 2,50 19,48
dm-15 1,10 0,00 24,60 61,00 0,10 1,45 37,04 1,18 13,82 16,50 12,73 3,60 30,84
dm-16 1,90 0,00 30,60 76,70 0,15 1,58 33,17 0,79 7,38 13,52 4,93 2,11 22,64
dm-17 1,80 0,00 28,30 60,00 0,13 1,48 37,31 0,79 8,96 8,35 9,25 2,51 22,20
dm-18 1,70 0,00 27,50 60,20 0,13 1,48 37,73 0,86 9,82 8,39 10,48 2,62 23,00
dm-19 1,30 0,00 26,60 58,60 0,12 1,48 38,52 0,82 9,69 8,86 10,06 2,73 23,24
To remind I have 5 x raidz2 (10 disks each) in the pool. I think read latencies could be lower until HDDs are not fully utilized. Also look above at avgrq-sz, which with 128k volblocksize for each zvol could be greater (?).
# cat /proc/spl/kstat/zfs/arcstats
6 1 0x01 91 4368 14750376509 1720663407183591
name type data
hits 4 1295222328
misses 4 700709334
demand_data_hits 4 1223555127
demand_data_misses 4 614111080
demand_metadata_hits 4 71667201
demand_metadata_misses 4 86598241
prefetch_data_hits 4 0
prefetch_data_misses 4 0
prefetch_metadata_hits 4 0
prefetch_metadata_misses 4 13
mru_hits 4 638690873
mru_ghost_hits 4 44559459
mfu_hits 4 656531455
mfu_ghost_hits 4 12586217
deleted 4 1190911893
mutex_miss 4 5322
evict_skip 4 527902
evict_not_enough 4 5276
evict_l2_cached 4 3576559772160
evict_l2_eligible 4 149751922885120
evict_l2_ineligible 4 8192
evict_l2_skip 4 0
hash_elements 4 4283461
hash_elements_max 4 4557941
hash_collisions 4 193735520
hash_chains 4 462626
hash_chain_max 4 7
p 4 63362530560
c 4 67586699264
c_min 4 33554432
c_max 4 67586699264
size 4 67610404512
hdr_size 4 621472624
data_size 4 65950842880
metadata_size 4 614019072
other_size 4 159147720
anon_size 4 47857664
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 61678938112
mru_evictable_data 4 61144170496
mru_evictable_metadata 4 942080
mru_ghost_size 4 5918584832
mru_ghost_evictable_data 4 106561536
mru_ghost_evictable_metadata 4 5812023296
mfu_size 4 4838066176
mfu_evictable_data 4 4758568960
mfu_evictable_metadata 4 32768
mfu_ghost_size 4 61667744256
mfu_ghost_evictable_data 4 55804297216
mfu_ghost_evictable_metadata 4 5863447040
l2_hits 4 2626898
l2_misses 4 113475855
l2_feeds 4 367717
l2_rw_clash 4 0
l2_read_bytes 4 88977219584
l2_write_bytes 4 2252755720192
l2_writes_sent 4 340705
l2_writes_done 4 340705
l2_writes_error 4 0
l2_writes_lock_retry 4 91
l2_evict_lock_retry 4 39
l2_evict_reading 4 0
l2_evict_l1cached 4 1206440
l2_free_on_write 4 9611
l2_cdata_free_on_write 4 5277
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 386570286080
l2_asize 4 175942882816
l2_hdr_size 4 264922216
l2_compress_successes 4 29507528
l2_compress_zeros 4 0
l2_compress_failures 4 9114103
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 0
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 835584
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 1659561632
arc_meta_limit 4 50690024448
arc_meta_max 4 18603087336
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 2112081920
@dweeezil We were also doing the test you suggested: 100% read. Latencies was high even with such IO profile. Not more than 40 MB/s read while random and not more than 150 MB/s while sequential.
I am all the time on 0.6.5.7 now.
I revealed why reads are needed during making backup. Veeam uses reverse incremental backup method, so data are firstly written to separate file (at NTFS level) and then are copied to another, bigger file. Unfortunately, ZFS zvols do not implement XCOPY (#4042), so it has to be copied physically.
Ideal workaround for this, until XCOPY is implemented, would be caching current writes to ARC/L2ARC. Do you have an idea if it can be tuned somehow for this purpose?
Hello,
I am using zfs 0.6.4.2 on kernel 4.0.4. I share zvols through FC target. The pool: 5 x raidz2 (10 disk each). Pool space usage: 29%.
The problem is in read performance, which generally was much greater, but now during copying file from one LUN to the another there is high zvol utilization causing low read throughput:
The physical disks (3 TB 7200 RPM) are not overloaded:
arcstat:
Writes are being flushed by NTFS, which used zvols, every 6 seconds.
Can you suggest some parameter tuning or it lies deeper?
Best regards, Tomasz Charoński