Open davidcpgray opened 6 months ago
My guess without more performance data would be that it's doing some workload while the dataset(s) are unlocked (like quota metadata recalculation) that's drowning the pool in IOs, and since scrub is lowest priority compared to anything else, it's going to keep getting bottom-fed if something is doing a bunch of IO.
But it can't do that calculation if the dataset isn't unlocked, since some of the data it needs to do that is encrypted, I believe.
Depending on what the IO it's doing mostly is, that would be my guess, absent more data.
(My other guess would be that your system config is one where the SIMD acceleration of various stuff isn't working for some or all of it...you could go check /sys/module/icp/parameters/icp_aes_impl
, /sys/module/icp/parameters/icp_gcm_impl
, /proc/spl/kstat/zfs/fletcher_4_bench
, and /proc/spl/kstat/zfs/vdev_raidz_bench
for more data.)
Hi,
Thanks for having a look at this...
This system is very lightly loaded most of the time, including when the scrub operations run. So the only way the pool is 'drowning in IOs' is if the scrub operation itself is generating that IO. And if the scrub was generating enough IO to kill performance, then surely this would also happen to regular non-encrypted systems. So I don't believe this to be the case.
Very happy to generate more performance data, but would need some guidance on what might be required/useful here.
/sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 aesni
/sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] avx generic pclmulqdq
/proc/spl/kstat/zfs/fletcher_4_bench
0 0 0x01 -1 0 2836382540 298647963538696
implementation native byteswap
scalar 8617482592 6724286558
superscalar 7476357776 8467565223
superscalar4 8298708095 7951912804
sse2 17322514911 10947136176
ssse3 18113077177 14824031285
avx2 30054657319 23233022400
fastest avx2 avx2
/proc/spl/kstat/zfs/vdev_raidz_bench
18 0 0x01 -1 0 3138481885 298647964084633
implementation gen_p gen_pq gen_pqr rec_p rec_q rec_r rec_pq rec_pr rec_qr rec_pqr
original 644998740 372316431 150949561 1727026378 332465367 53011442 148503162 30191008 30194757 20960306
scalar 1955625227 545753670 237305602 1944392248 660812362 486452663 349016606 263140101 179607347 138426858
sse2 3119871353 1435585630 754741384 3370304972 1197325623 977363501 603857033 552382761 335089470 148852669
ssse3 3225892195 1435188628 755202070 3362852444 1912160887 1467897009 1125390665 948306439 676437394 530878920
avx2 5878558585 2449914268 1342233423 5653674308 3674654552 2961701604 1962847190 1709563836 1266078086 987526682
fastest avx2 avx2 avx2 avx2 avx2 avx2 avx2 avx2 avx2 avx2
Well, my general suggestion then would be to look at the output of, say, mpstat -P ALL 1
for 30-60s while it's scrubbing in poor performance mode, and the same when it's not running poorly, and see if it's spending most of its time in %sys
or %iowait
when it's running poorly, compared to baseline.
If it's all in %sys
, then go do something like look at perf top
or generate a FlameGraph to see where it's spending that time. If it's not, that's a slightly more complicated question.
System information
Describe the problem you're observing
zpool scrub operation on zpool with mounted encrypted filesystems exhibits extremely high I/O load on drives but very poor read data rate and negligable progress through the scrub opration.
Describe how to reproduce the problem
zpool scrub poolname
System has 4-disk raidz1 pool with 16TB Western DIgital Red Pro HDD's. All filesystems in the pool are created with native ZFS encryption
Performance of scrub operations initiated on the pool with encryption key loaded and filesystems mounted is extremely poor. 'iostat -xmc 2' reports: ~300-350 r/s, 3-5 rMB/s, and 100 %util. (per drive)
Have left the scrub operation running for 12 hours or more and it makes hardly any progress with 'zpool status' reporting 'no estimated completion time'
If however the pool is exported, then immediately re-imported without loading the encryption key and the scrub operation is re-run, then 'expected' performance is observed: 250-500 r/s, 170-240 rMB/s, 95-100% util (per drive)
Key metric here is rMB's which is effectively at or near max sequential data rate for these drives.
Once the scrub is running with expected 'good' perfomance characteristics, then can re-load the encryption key and re-mount the filesystems and the scrub continues with good performace and eventually completes in ~20 hours which is broadly expected for this system and drives.
Both 'good' and 'poor' performance modes are readily reproducable by importing or exporting the pool, no system reboots are required.
Include any warning/errors/backtraces from the system logs
Nothing reported in system logs / dmesg.
This behaviour has been observed with multiple ZFS versions since around 2.0.3. But only recently discovered the correlation between encrypted fs mounted / unmounted affecting performance.
Happy to provide additional information on request.