openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

ARC target size varying wildly #7820

Open rincebrain opened 6 years ago

rincebrain commented 6 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version 9.5
Linux Kernel 4.9.0-7-amd64
Architecture amd64
ZFS Version 0.7.9
SPL Version 0.7.9

Describe the problem you're observing

When looking at some stat graphs of e.g. the ARC, I noticed that the target size for the ARC appears to be varying incredibly wildly and rapidly.

https://i.imgur.com/1VHy1Hv.png is a graph of min/max/target/current over a 10m interval. The bottom orange that's fluctuating wildly is the target size value (it says red on the legend to the right, but the translucent red over the green makes it look orange).

In addition, looking at this graph ( https://i.imgur.com/GAVFb9j.png ) of MRU/MFU ratio makes me think either something's evicting all of the MRU at regular intervals or somehow ~100% of the MRU entries are getting on the MFU list, and the timing of the ARC target size drops makes me think it's unlikely to be the latter.

It's esp. confusing because, since the system is idle, there's not really anything external that should be exerting memory pressure to trigger drops like that.

Describe how to reproduce the problem

Unknown; the system is presently fairly idle, taking hourly snapshots of the datasets but otherwise not doing much of anything.

Include any warning/errors/backtraces from the system logs

Nothing interesting in dmesg.

richardelling commented 6 years ago

I don't see a bug here, perhaps better to discuss on the mail list?

What you're calling target doesn't seem correct to me because the size follows target, but that isn't shown here.

richardelling commented 6 years ago

I am seeing strangeness wrt unexpected target size drop. @rincebrain can you dig up the no_grow, memory_indirect_count, and memory_direct_count during the time in question? Something like the attached. image

rincebrain commented 6 years ago

@richardelling Sadly, I don't have that particular stat recorded; I can add it, but it's obviously not retroactive.

https://openzfs.topicbox.com/groups/developer/T10a105c53bcce15c-M53cff3b459ecfa98e446241c sounds remarkably like it could do something similar to what I reported.

richardelling commented 6 years ago

Agree, at the least this is an unintentional bug from the aggsum checkin. I'll be testing a trivial fix for this in the next day or so and see if it prevents the target reduction.

That said, I don't understand why your size didn't shrink along with the target.

rincebrain commented 6 years ago

May I suggest/request this come in 0.7.10 (#7792) as well, unless it turns out the trivial fix sets something on fire, so we don't have to wait for 0.8 for the MFU/MRU to stop see-sawing? :)

rincebrain commented 6 years ago

@richardelling I think what happened is that, since the eviction bug made it always try, it kept reducing the target it was trying to hit long after it had run out of things to evict in there, but once it ran out of things to drop from the MRU, it stopped successfully dropping anything, until it cratered out and then the other codepaths resulted in the target normalizing...until MRU got to be a large enough portion of the ARC, and then we repeat. (That would explain why target craters, slowly grows, then almost exactly tracks arcsz until it hits a limit and reaps. I don't know where the 50% comes from, though, since zfs_arc_meta_limit_percent is 75 and none of the other math seems to work out to that value when composed. )

My current values for those two counters on that machine are:

memory_direct_count             4    856767
memory_indirect_count           4    8464100
richardelling commented 6 years ago

methinks there is more than one issue at play. The trivial change to arc_adjust() is correct, but doesn’t solve the target reduction I see in automated testing.

behlendorf commented 6 years ago

We definitely need to get the trival aggsum fix reviewed and in to master to address that issue. But there's something else going on which needs to be understood since none of that code is part of the 0.7.x series.

@richardelling's graphs from https://github.com/zfsonlinux/zfs/issues/7820#issuecomment-417051916 are pretty interesting. The memory_*direct_count's get bumped once each time the kernel, not ZFS, determines the system is low on memory. The graphs show that the kernel, for some reason, starts consistently asking ZFS to release some memory. Which ZFS does, then adjusts the new target downward accordingly. The expectation is that at some point the kernel will stop asking and we'll settle in to a new steady state. But for some reason that's not happening.

This might be the result of a kernel change in memory management behavior, but I haven't gone looking for a specific commit yet. So it's possible this only effects newer kernel.

If either of you are game, could you try commit f7312913c9c7c6558121bc1ee2f9296b5d037c1d on one of your test systems to confirm this is what's going on.

After patching the source you'll need to set zfs_arc_shrinker_enabled=0 to test the change. The idea here is to allow the arc_reclaim thread to solely manage the indirect memory reclaim from the ARC. Only direct memory reclaim requests by the kernel will result in memory being freed. These should happen when the kernel needs memory now. Likely because a rapid change in the memory requirements. This should prevent the collapse.

rincebrain commented 6 years ago

I'm game; sorry I hadn't already noticed that the arc_adjust code refactors postdate 0.7-release, I'm building an 0.7 spin with f731291 now to give a whirl...

richardelling commented 6 years ago

@behlendorf you're a step ahead of me, I was just fixin' to code that almost exactly the same way. I'll get it into test later today.

Meanwhile, the other fix is trivial but AFAIK applies to master. AFAIK, aggsums weren't backported to 0.7

richardelling commented 6 years ago

dammitalltohell... hit the wrong button

behlendorf commented 6 years ago

@richardelling PR's with the trivial fix against master are welcome!

richardelling commented 6 years ago

There is still a case where target falls as memory_direct_count starts.

image

rincebrain commented 6 years ago

Out of curiosity, what's recording that data and spitting those graphs out? (My above ones are from netdata, but it makes it more difficult than I'd like to add new graphs.)

behlendorf commented 6 years ago

@richardelling is it possible that with your workload something on the system is legitimately using that memory? Direct reclaim shouldn't be invoked unless the memory is needed elsewhere. Which kernel version are you testing with?

richardelling commented 6 years ago

Thanks for being the straight guy :-) I'll present all of that at the OpenZFS 2018 conference soon. The short answer is you can use some combination of: telegraf or node_exporter to collect all of the stats, influxdb or prometheus to store and do math, then grafana to make pretty pictures.

richardelling commented 6 years ago

@behlendorf though I rarely test on generic distros, this is an exception. RHEL 7.4 kernel 3.10.0-693.21.1.el7.x86_64

System has 128GiB and ZFS is the only workload under test. That said, getting to the bottom of why the kernel wants reclaim isn't well instrumented. I'm tempted to expose last_free_memory and last_free_reason as kstats to get better insight.

rincebrain commented 6 years ago

Yup, it's definitely still happening, though setting zfs_arc_shrinker_enabled=0 seems to cause arcsz+target to sometimes vary even higher?

arc size collapse 1

I think you can guess without me specifying where I set zfs_arc_shrinker_enabled=0, from my above statement.

arc size collapse 2

Ratio of MFU/MRU. Nothing interesting seems to have changed when I changed that tunable.

arc size collapse 3

Lots of direct memory throttles once I turned shrinker off, bursts of indirect throttles with it on, and the number of ARC elements goes from a steady state to tiny.

(I reproduced this by doing a zfs send -Rec on a 7TB dataset to /dev/null; it happened almost immediately. Confusingly, I couldn't use this to reproduce it right after reboot into the version with shrinker_enabled in the module...)

richardelling commented 6 years ago

This seems to match my experience as well. There are cases where memory_direct_count increases even though we have tens of GB available. I'll try to dig into it more next week.

behlendorf commented 6 years ago

@rincebrain if possible would you mind cherry-picking this https://github.com/zfsonlinux/zfs/pull/8000/commits/4c4fb6281c014633131733557144e0a0eb0bcec5 fix and see if it resolves the issue. Unfortunately I wasn't able to reproduce the issue as easily as you seem to be able too. If you are still able to reproduce the issue the full arcstats output from the bad state would be helpful, I didn't see it above.

rincebrain commented 6 years ago

Cherrypicked it onto 0.7.11; at first glance, it seems to do something useful.

(I made these two graphs by having arc_min=arc_max, piping a large send into /dev/null, then setting arc_min to 2GB once arcsz was arc_max for a bit. )

Before 4c4fb62: before 7820 cropped

After: after 7820 cropped

It also now steady-states around arc_max even after the send is over, which (given the absence of memory pressure on the system) seems like the correct behavior.

behlendorf commented 6 years ago

Closing. This should be resolved by 4c4fb62 which will be in 0.7.12.

rincebrain commented 5 years ago

Huh, I thought I commented in here that this appears to only be a partial fix, but apparently not.

Well, as the first line says, it certainly helps and reduces the frequency, but the ARC still ends up periodically dumping to arc_min without any obvious correlated external memory pressure.

e: Amusingly, it took about 2.5 minutes to reproduce when I dropped arc_min from =arc_max (48 GiB) to 32 GiB.

newgraph1 newgraph2 newgraph3

The system was not, AFAICT, under any memory pressure when this occurred - available memory shot up, and has not gone down again since.

It just completely evicted the recent queue. I might be misunderstanding the ARC expected behavior, but I don't think I expect it to drop 100% of the MRU as soon as it tries allocating and is at arc_max? I suppose that could be reasonable if the entire content of the MRU is old enough that it'd be evicted without the arc_min setting, but zooming out it does seem like the MRU elements were being hit prior to the mass eviction. It still seems strange that it'd evict 100% of one and not the other, though.

jwittlincohen commented 5 years ago
Type Version/Name
Distribution Name Debian
Distribution Version 9.6
Linux Kernel 4.9.0-8-amd64 (4.9.130-2 (2018-10-27)))
Architecture amd64
ZFS Version 0.7.12
SPL Version 0.7.12

I may have a related issue. My ARC sits at zfs_arc_min despite no apparent memory pressure (30+ GB free memory) and my MRU is essentially pushed to near zero (0.2% of ARC).

Memory Usage

ARC History

              total        used        free      shared  buff/cache   available
Mem:         128908       88659       38708          51        1540       39316
Swap:             0           0           0
ZFS Subsystem Report                            Sun Jan 13 20:36:21 2019
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                4.25M
        Mutex Misses:                           1.21k
        Evict Skips:                            7.85M

ARC Size:                               80.00%  80.00   GiB
        Target Size: (Adaptive)         80.01%  80.01   GiB
        Min Size (Hard Limit):          80.00%  80.00   GiB
        Max Size (High Water):          1:1     100.00  GiB

ARC Size Breakdown:
        Recently Used Cache Size:       0.21%   172.15  MiB
        Frequently Used Cache Size:     99.79%  79.35   GiB

ARC Hash Breakdown:
        Elements Max:                           1.44M
        Elements Current:               65.00%  936.15k
        Collisions:                             609.45k
        Chain Max:                              5
        Chains:                                 24.35k

ARC Total accesses:                                     641.02M
        Cache Hit Ratio:                96.69%  619.79M
        Cache Miss Ratio:               3.31%   21.22M
        Actual Hit Ratio:               96.59%  619.16M

        Data Demand Efficiency:         98.82%  143.29M
        Data Prefetch Efficiency:       17.48%  4.59M

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           12.78%  79.19M
          Most Frequently Used:         87.12%  539.97M
          Most Recently Used Ghost:     0.20%   1.23M
          Most Frequently Used Ghost:   0.00%   1.34k

        CACHE HITS BY DATA TYPE:
          Demand Data:                  22.85%  141.60M
          Prefetch Data:                0.13%   801.63k
          Demand Metadata:              76.95%  476.94M
          Prefetch Metadata:            0.07%   450.29k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  7.97%   1.69M
          Prefetch Data:                17.83%  3.78M
          Demand Metadata:              73.39%  15.58M
          Prefetch Metadata:            0.82%   173.67k

L2 ARC Summary: (HEALTHY)
        Low Memory Aborts:                      6
        Free on Write:                          0
        R/W Clashes:                            0
        Bad Checksums:                          0
        IO Errors:                              0

L2 ARC Size: (Adaptive)                         2.23    GiB
        Compressed:                     28.65%  654.65  MiB
        Header Size:                    0.00%   0       Bytes

L2 ARC Breakdown:                               21.22M
        Hit Ratio:                      0.04%   9.53k
        Miss Ratio:                     99.96%  21.21M
        Feeds:                                  123.95k

L2 ARC Writes:
        Writes Sent:                    100.00% 36.14k

DMU Prefetch Efficiency:                                        207.78M
        Hit Ratio:                      1.49%   3.10M
        Miss Ratio:                     98.51%  204.68M

ZFS Tunables:
        dbuf_cache_hiwater_pct                            10
        dbuf_cache_lowater_pct                            10
        dbuf_cache_max_bytes                              104857600
        dbuf_cache_max_shift                              5
        dmu_object_alloc_chunk_shift                      7
        ignore_hole_birth                                 1
        l2arc_feed_again                                  1
        l2arc_feed_min_ms                                 200
        l2arc_feed_secs                                   1
        l2arc_headroom                                    2
        l2arc_headroom_boost                              200
        l2arc_noprefetch                                  1
        l2arc_norw                                        0
        l2arc_write_boost                                 8388608
        l2arc_write_max                                   8388608
        metaslab_aliquot                                  524288
        metaslab_bias_enabled                             1
        metaslab_debug_load                               0
        metaslab_debug_unload                             0
        metaslab_fragmentation_factor_enabled             1
        metaslab_lba_weighting_enabled                    1
        metaslab_preload_enabled                          1
        metaslabs_per_vdev                                200
        send_holes_without_birth_time                     1
        spa_asize_inflation                               24
        spa_config_path                                   /etc/zfs/zpool.cache
        spa_load_verify_data                              1
        spa_load_verify_maxinflight                       10000
        spa_load_verify_metadata                          1
        spa_slop_shift                                    5
        zfetch_array_rd_sz                                1048576
        zfetch_max_distance                               8388608
        zfetch_max_streams                                8
        zfetch_min_sec_reap                               2
        zfs_abd_scatter_enabled                           1
        zfs_abd_scatter_max_order                         10
        zfs_admin_snapshot                                1
        zfs_arc_average_blocksize                         8192
        zfs_arc_dnode_limit                               0
        zfs_arc_dnode_limit_percent                       10
        zfs_arc_dnode_reduce_percent                      10
        zfs_arc_grow_retry                                0
        zfs_arc_lotsfree_percent                          10
        zfs_arc_max                                       107374182400
        zfs_arc_meta_adjust_restarts                      4096
        zfs_arc_meta_limit                                0
        zfs_arc_meta_limit_percent                        75
        zfs_arc_meta_min                                  10737418240
        zfs_arc_meta_prune                                10000
        zfs_arc_meta_strategy                             1
        zfs_arc_min                                       85899345920
        zfs_arc_min_prefetch_lifespan                     0
        zfs_arc_p_dampener_disable                        1
        zfs_arc_p_min_shift                               0
        zfs_arc_pc_percent                                0
        zfs_arc_shrink_shift                              0
        zfs_arc_sys_free                                  0
        zfs_autoimport_disable                            1
        zfs_checksums_per_second                          20
        zfs_compressed_arc_enabled                        1
        zfs_dbgmsg_enable                                 0
        zfs_dbgmsg_maxsize                                4194304
        zfs_dbuf_state_index                              0
        zfs_deadman_checktime_ms                          5000
        zfs_deadman_enabled                               1
        zfs_deadman_synctime_ms                           1000000
        zfs_dedup_prefetch                                0
        zfs_delay_min_dirty_percent                       60
        zfs_delay_scale                                   500000
        zfs_delays_per_second                             20
        zfs_delete_blocks                                 20480
        zfs_dirty_data_max                                4294967296
        zfs_dirty_data_max_max                            4294967296
        zfs_dirty_data_max_max_percent                    25
        zfs_dirty_data_max_percent                        10
        zfs_dirty_data_sync                               67108864
        zfs_dmu_offset_next_sync                          0
        zfs_expire_snapshot                               300
        zfs_flags                                         0
        zfs_free_bpobj_enabled                            1
        zfs_free_leak_on_eio                              0
        zfs_free_max_blocks                               100000
        zfs_free_min_time_ms                              1000
        zfs_immediate_write_sz                            32768
        zfs_max_recordsize                                1048576
        zfs_mdcomp_disable                                0
        zfs_metaslab_fragmentation_threshold              70
        zfs_metaslab_segment_weight_enabled               1
        zfs_metaslab_switch_threshold                     2
        zfs_mg_fragmentation_threshold                    85
        zfs_mg_noalloc_threshold                          0
        zfs_multihost_fail_intervals                      5
        zfs_multihost_history                             0
        zfs_multihost_import_intervals                    10
        zfs_multihost_interval                            1000
        zfs_multilist_num_sublists                        0
        zfs_no_scrub_io                                   0
        zfs_no_scrub_prefetch                             0
        zfs_nocacheflush                                  0
        zfs_nopwrite_enabled                              1
        zfs_object_mutex_size                             64
        zfs_pd_bytes_max                                  52428800
        zfs_per_txg_dirty_frees_percent                   30
        zfs_prefetch_disable                              0
        zfs_read_chunk_size                               1048576
        zfs_read_history                                  0
        zfs_read_history_hits                             0
        zfs_recover                                       0
        zfs_recv_queue_length                             16777216
        zfs_resilver_delay                                2
        zfs_resilver_min_time_ms                          3000
        zfs_scan_idle                                     50
        zfs_scan_ignore_errors                            0
        zfs_scan_min_time_ms                              1000
        zfs_scrub_delay                                   4
        zfs_send_corrupt_data                             0
        zfs_send_queue_length                             16777216
        zfs_sync_pass_deferred_free                       2
        zfs_sync_pass_dont_compress                       5
        zfs_sync_pass_rewrite                             2
        zfs_sync_taskq_batch_pct                          75
        zfs_top_maxinflight                               32
        zfs_txg_history                                   0
        zfs_txg_timeout                                   5
        zfs_vdev_aggregation_limit                        131072
        zfs_vdev_async_read_max_active                    3
        zfs_vdev_async_read_min_active                    1
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_vdev_async_write_active_min_dirty_percent     30
        zfs_vdev_async_write_max_active                   10
        zfs_vdev_async_write_min_active                   2
        zfs_vdev_cache_bshift                             16
        zfs_vdev_cache_max                                16384
        zfs_vdev_cache_size                               0
        zfs_vdev_max_active                               1000
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_vdev_mirror_rotating_inc                      0
        zfs_vdev_mirror_rotating_seek_inc                 5
        zfs_vdev_mirror_rotating_seek_offset              1048576
        zfs_vdev_queue_depth_pct                          1000
        zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3
        zfs_vdev_read_gap_limit                           32768
        zfs_vdev_scheduler                                noop
        zfs_vdev_scrub_max_active                         2
        zfs_vdev_scrub_min_active                         1
        zfs_vdev_sync_read_max_active                     10
        zfs_vdev_sync_read_min_active                     10
        zfs_vdev_sync_write_max_active                    10
        zfs_vdev_sync_write_min_active                    10
        zfs_vdev_write_gap_limit                          4096
        zfs_zevent_cols                                   80
        zfs_zevent_console                                0
        zfs_zevent_len_max                                640
        zil_replay_disable                                0
        zil_slog_bulk                                     786432
        zio_delay_max                                     30000
        zio_dva_throttle_enabled                          1
        zio_requeue_io_start_cut_in_line                  1
        zio_taskq_batch_pct                               75
        zvol_inhibit_dev                                  0
        zvol_major                                        230
        zvol_max_discard_blocks                           16384
        zvol_prefetch_bytes                               131072
        zvol_request_sync                                 0
        zvol_threads                                      32
        zvol_volmode                                      1
root@storage-server:/etc/openvpn# cat /proc/spl/kstat/zfs/arcstats
13 1 0x01 96 26112 34656842033 127935894113184
name                            type data
hits                            4    622723998
misses                          4    21332134
demand_data_hits                4    142369649
demand_data_misses              4    1700775
demand_metadata_hits            4    479096386
demand_metadata_misses          4    15655646
prefetch_data_hits              4    805936
prefetch_data_misses            4    3801181
prefetch_metadata_hits          4    452027
prefetch_metadata_misses        4    174532
mru_hits                        4    79310752
mru_ghost_hits                  4    1232252
mfu_hits                        4    542775912
mfu_ghost_hits                  4    1340
deleted                         4    4257143
mutex_miss                      4    1215
access_skip                     4    490
evict_skip                      4    7881662
evict_not_enough                4    295699
evict_l2_cached                 4    1304838656
evict_l2_eligible               4    179141345792
evict_l2_ineligible             4    1091968961536
evict_l2_skip                   4    0
hash_elements                   4    941329
hash_elements_max               4    1440242
hash_collisions                 4    612202
hash_chains                     4    24534
hash_chain_max                  4    5
p                               4    480386928
c                               4    86023009400
c_min                           4    85899345920
c_max                           4    107374182400
size                            4    85741000384
compressed_size                 4    84978538496
uncompressed_size               4    87820712960
overhead_size                   4    257947648
hdr_size                        4    317168928
data_size                       4    84086763008
metadata_size                   4    1149723136
dbuf_size                       4    39354120
dnode_size                      4    110780952
bonus_size                      4    37210240
anon_size                       4    7573504
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    472462848
mru_evictable_data              4    295796736
mru_evictable_metadata          4    1918464
mru_ghost_size                  4    85465085952
mru_ghost_evictable_data        4    80576998912
mru_ghost_evictable_metadata    4    4888087040
mfu_size                        4    84756449792
mfu_evictable_data              4    83609634304
mfu_evictable_metadata          4    940071424
mfu_ghost_size                  4    201733120
mfu_ghost_evictable_data        4    201733120
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    9540
l2_misses                       4    21315322
l2_feeds                        4    124888
l2_rw_clash                     4    0
l2_read_bytes                   4    47256064
l2_write_bytes                  4    756345856
l2_writes_sent                  4    36177
l2_writes_done                  4    36177
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    6
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    2396663296
l2_asize                        4    686565888
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    711073
memory_indirect_count           4    170203
memory_all_bytes                4    135170146304
memory_free_bytes               4    41371328512
memory_available_bytes          3    39259295744
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    1654237376
arc_meta_limit                  4    80530636800
arc_dnode_limit                 4    10737418240
arc_meta_max                    4    4702071400
arc_meta_min                    4    10737418240
sync_wait_for_async             4    23803
demand_hit_predictive_prefetch  4    1669466
arc_need_free                   4    0
arc_sys_free                    4    2112033536
jwittlincohen commented 5 years ago

I also made another observation about my ARC behavior. The MRU cache tends toward zero - usually stabalizing at 100-200 MiB. As noted above, ARC tends to stay at zfs_arc_min even when the system has tens of GBs of free/available memory.

Here are updated weekly graphs of memory and ARC usage as well as output from arc_summary and kstats. The system has been up around 9 1/2 days, so it should be at steady-state for ARC.

Memory Usage

ARC History

------------------------------------------------------------------------
ZFS Subsystem Report                            Mon Jan 21 16:52:14 2019
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                29.87M
        Mutex Misses:                           8.75k
        Evict Skips:                            64.61M

ARC Size:                               69.30%  69.30   GiB
        Target Size: (Adaptive)         90.07%  90.07   GiB
        Min Size (Hard Limit):          90.00%  90.00   GiB
        Max Size (High Water):          1:1     100.00  GiB

ARC Size Breakdown:
        Recently Used Cache Size:       2.17%   1.47    GiB
        Frequently Used Cache Size:     97.83%  66.20   GiB

ARC Hash Breakdown:
        Elements Max:                           4.67M
        Elements Current:               86.12%  4.02M
        Collisions:                             8.53M
        Chain Max:                              6
        Chains:                                 407.29k

ARC Total accesses:                                     3.35G
        Cache Hit Ratio:                95.80%  3.21G
        Cache Miss Ratio:               4.20%   140.57M
        Actual Hit Ratio:               95.61%  3.20G

        Data Demand Efficiency:         98.52%  747.94M
        Data Prefetch Efficiency:       16.73%  28.35M

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           9.37%   300.30M
          Most Frequently Used:         90.43%  2.90G
          Most Recently Used Ghost:     0.35%   11.25M
          Most Frequently Used Ghost:   0.00%   21.08k

        CACHE HITS BY DATA TYPE:
          Demand Data:                  22.99%  736.89M
          Prefetch Data:                0.15%   4.74M
          Demand Metadata:              76.46%  2.45G
          Prefetch Metadata:            0.41%   13.14M

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  7.86%   11.04M
          Prefetch Data:                16.80%  23.61M
          Demand Metadata:              71.83%  100.98M
          Prefetch Metadata:            3.51%   4.94M

DMU Prefetch Efficiency:                                        1.19G
        Hit Ratio:                      1.20%   14.22M
        Miss Ratio:                     98.80%  1.17G

ZFS Tunables:
        dbuf_cache_hiwater_pct                            10
        dbuf_cache_lowater_pct                            10
        dbuf_cache_max_bytes                              104857600
        dbuf_cache_max_shift                              5
        dmu_object_alloc_chunk_shift                      7
        ignore_hole_birth                                 1
        l2arc_feed_again                                  1
        l2arc_feed_min_ms                                 200
        l2arc_feed_secs                                   1
        l2arc_headroom                                    2
        l2arc_headroom_boost                              200
        l2arc_noprefetch                                  1
        l2arc_norw                                        0
        l2arc_write_boost                                 8388608
        l2arc_write_max                                   8388608
        metaslab_aliquot                                  524288
        metaslab_bias_enabled                             1
        metaslab_debug_load                               0
        metaslab_debug_unload                             0
        metaslab_fragmentation_factor_enabled             1
        metaslab_lba_weighting_enabled                    1
        metaslab_preload_enabled                          1
        metaslabs_per_vdev                                200
        send_holes_without_birth_time                     1
        spa_asize_inflation                               24
        spa_config_path                                   /etc/zfs/zpool.cache
        spa_load_verify_data                              1
        spa_load_verify_maxinflight                       10000
        spa_load_verify_metadata                          1
        spa_slop_shift                                    5
        zfetch_array_rd_sz                                1048576
        zfetch_max_distance                               8388608
        zfetch_max_streams                                8
        zfetch_min_sec_reap                               2
        zfs_abd_scatter_enabled                           1
        zfs_abd_scatter_max_order                         10
        zfs_admin_snapshot                                1
        zfs_arc_average_blocksize                         8192
        zfs_arc_dnode_limit                               0
        zfs_arc_dnode_limit_percent                       10
        zfs_arc_dnode_reduce_percent                      10
        zfs_arc_grow_retry                                0
        zfs_arc_lotsfree_percent                          10
        zfs_arc_max                                       107374182400
        zfs_arc_meta_adjust_restarts                      4096
        zfs_arc_meta_limit                                0
        zfs_arc_meta_limit_percent                        75
        zfs_arc_meta_min                                  10737418240
        zfs_arc_meta_prune                                10000
        zfs_arc_meta_strategy                             1
        zfs_arc_min                                       96636764160
        zfs_arc_min_prefetch_lifespan                     0
        zfs_arc_p_dampener_disable                        1
        zfs_arc_p_min_shift                               0
        zfs_arc_pc_percent                                0
        zfs_arc_shrink_shift                              0
        zfs_arc_sys_free                                  0
        zfs_autoimport_disable                            1
        zfs_checksums_per_second                          20
        zfs_compressed_arc_enabled                        1
        zfs_dbgmsg_enable                                 0
        zfs_dbgmsg_maxsize                                4194304
        zfs_dbuf_state_index                              0
        zfs_deadman_checktime_ms                          5000
        zfs_deadman_enabled                               1
        zfs_deadman_synctime_ms                           1000000
        zfs_dedup_prefetch                                0
        zfs_delay_min_dirty_percent                       60
        zfs_delay_scale                                   500000
        zfs_delays_per_second                             20
        zfs_delete_blocks                                 20480
        zfs_dirty_data_max                                4294967296
        zfs_dirty_data_max_max                            4294967296
        zfs_dirty_data_max_max_percent                    25
        zfs_dirty_data_max_percent                        10
        zfs_dirty_data_sync                               67108864
        zfs_dmu_offset_next_sync                          0
        zfs_expire_snapshot                               300
        zfs_flags                                         0
        zfs_free_bpobj_enabled                            1
        zfs_free_leak_on_eio                              0
        zfs_free_max_blocks                               100000
        zfs_free_min_time_ms                              1000
        zfs_immediate_write_sz                            32768
        zfs_max_recordsize                                1048576
        zfs_mdcomp_disable                                0
        zfs_metaslab_fragmentation_threshold              70
        zfs_metaslab_segment_weight_enabled               1
        zfs_metaslab_switch_threshold                     2
        zfs_mg_fragmentation_threshold                    85
        zfs_mg_noalloc_threshold                          0
        zfs_multihost_fail_intervals                      5
        zfs_multihost_history                             0
        zfs_multihost_import_intervals                    10
        zfs_multihost_interval                            1000
        zfs_multilist_num_sublists                        0
        zfs_no_scrub_io                                   0
        zfs_no_scrub_prefetch                             0
        zfs_nocacheflush                                  0
        zfs_nopwrite_enabled                              1
        zfs_object_mutex_size                             64
        zfs_pd_bytes_max                                  52428800
        zfs_per_txg_dirty_frees_percent                   30
        zfs_prefetch_disable                              0
        zfs_read_chunk_size                               1048576
        zfs_read_history                                  0
        zfs_read_history_hits                             0
        zfs_recover                                       0
        zfs_recv_queue_length                             16777216
        zfs_resilver_delay                                2
        zfs_resilver_min_time_ms                          3000
        zfs_scan_idle                                     50
        zfs_scan_ignore_errors                            0
        zfs_scan_min_time_ms                              1000
        zfs_scrub_delay                                   4
        zfs_send_corrupt_data                             0
        zfs_send_queue_length                             16777216
        zfs_sync_pass_deferred_free                       2
        zfs_sync_pass_dont_compress                       5
        zfs_sync_pass_rewrite                             2
        zfs_sync_taskq_batch_pct                          75
        zfs_top_maxinflight                               32
        zfs_txg_history                                   0
        zfs_txg_timeout                                   5
        zfs_vdev_aggregation_limit                        131072
        zfs_vdev_async_read_max_active                    3
        zfs_vdev_async_read_min_active                    1
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_vdev_async_write_active_min_dirty_percent     30
        zfs_vdev_async_write_max_active                   10
        zfs_vdev_async_write_min_active                   2
        zfs_vdev_cache_bshift                             16
        zfs_vdev_cache_max                                16384
        zfs_vdev_cache_size                               0
        zfs_vdev_max_active                               1000
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_vdev_mirror_rotating_inc                      0
        zfs_vdev_mirror_rotating_seek_inc                 5
        zfs_vdev_mirror_rotating_seek_offset              1048576
        zfs_vdev_queue_depth_pct                          1000
        zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3
        zfs_vdev_read_gap_limit                           32768
        zfs_vdev_scheduler                                noop
        zfs_vdev_scrub_max_active                         2
        zfs_vdev_scrub_min_active                         1
        zfs_vdev_sync_read_max_active                     10
        zfs_vdev_sync_read_min_active                     10
        zfs_vdev_sync_write_max_active                    10
        zfs_vdev_sync_write_min_active                    10
        zfs_vdev_write_gap_limit                          4096
        zfs_zevent_cols                                   80
        zfs_zevent_console                                0
        zfs_zevent_len_max                                640
        zil_replay_disable                                0
        zil_slog_bulk                                     786432
        zio_delay_max                                     30000
        zio_dva_throttle_enabled                          1
        zio_requeue_io_start_cut_in_line                  1
        zio_taskq_batch_pct                               75
        zvol_inhibit_dev                                  0
        zvol_major                                        230
        zvol_max_discard_blocks                           16384
        zvol_prefetch_bytes                               131072
        zvol_request_sync                                 0
        zvol_threads                                      32
        zvol_volmode                                      1
13 1 0x01 96 26112 34656842033 804777653833568
name                            type data
hits                            4    3205852475
misses                          4    140567980
demand_data_hits                4    736894151
demand_data_misses              4    11042858
demand_metadata_hits            4    2451071653
demand_metadata_misses          4    100976111
prefetch_data_hits              4    4743754
prefetch_data_misses            4    23608835
prefetch_metadata_hits          4    13142917
prefetch_metadata_misses        4    4940176
mru_hits                        4    300295430
mru_ghost_hits                  4    11254932
mfu_hits                        4    2899165514
mfu_ghost_hits                  4    21080
deleted                         4    29873669
mutex_miss                      4    8754
access_skip                     4    1977
evict_skip                      4    64612791
evict_not_enough                4    1469200
evict_l2_cached                 4    216456410112
evict_l2_eligible               4    2796371940864
evict_l2_ineligible             4    4620242411520
evict_l2_skip                   4    0
hash_elements                   4    4024597
hash_elements_max               4    4673090
hash_collisions                 4    8526637
hash_chains                     4    407278
hash_chain_max                  4    6
p                               4    1304205685
c                               4    96710664648
c_min                           4    96636764160
c_max                           4    107374182400
size                            4    74415300672
compressed_size                 4    72316176896
uncompressed_size               4    90511407616
overhead_size                   4    351967744
hdr_size                        4    1353753080
data_size                       4    65247920640
metadata_size                   4    7420224000
dbuf_size                       4    84170424
dnode_size                      4    228276368
bonus_size                      4    80956160
anon_size                       4    4094464
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    1579768832
mru_evictable_data              4    1520373248
mru_evictable_metadata          4    10283008
mru_ghost_size                  4    94362664448
mru_ghost_evictable_data        4    52083600384
mru_ghost_evictable_metadata    4    42279064064
mfu_size                        4    71084281344
mfu_evictable_data              4    63615495168
mfu_evictable_metadata          4    7061716480
mfu_ghost_size                  4    1458978816
mfu_ghost_evictable_data        4    1458978816
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    117318
l2_misses                       4    70174985
l2_feeds                        4    401206
l2_rw_clash                     4    0
l2_read_bytes                   4    10449594368
l2_write_bytes                  4    201448042496
l2_writes_sent                  4    134405
l2_writes_done                  4    134405
l2_writes_error                 4    0
l2_writes_lock_retry            4    2
l2_evict_lock_retry             4    1
l2_evict_reading                4    0
l2_evict_l1cached               4    235879
l2_free_on_write                4    34
l2_abort_lowmem                 4    38
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    3432416
memory_indirect_count           4    1689117
memory_all_bytes                4    135170146304
memory_free_bytes               4    49069887488
memory_available_bytes          3    46957854720
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    9167380032
arc_meta_limit                  4    80530636800
arc_dnode_limit                 4    10737418240
arc_meta_max                    4    17917926576
arc_meta_min                    4    10737418240
sync_wait_for_async             4    215029
demand_hit_predictive_prefetch  4    7582640
arc_need_free                   4    0
arc_sys_free                    4    2112033536
bunder2015 commented 5 years ago

Reopening by request

jwittlincohen commented 5 years ago

I've noticed that when zfs_arc_min equals zfs_arc_max, the MRU doesn't trend toward zero like it did when I had zfs_arc_min set to a smaller value. Specifically, I had zfs_arc_min set to 90 GiB and zfs_arc_max set to 100 GiB. Since it consistently sat at 90 GiB, I just set zfs_arc_min to match zfs_arc_max. Whereas my MRU was previously only a few hundred MB in size, it's now much larger, over 10 GiB.

ARC

joydashy commented 5 years ago

I am running 0.7.13 and have recently also noticed wildly fluctuating ARC target size:

image

bgly commented 4 years ago

I would have to agree this exists in 0.8.1 and 0.8.2.

softminus commented 4 years ago

i believe that #9548 explains (at least partly) this behavior. Furthermore, i have written some noninvasive tracing code that verifies that arc_adjust_cb's mishandling of arc_need_free as described in #9548 can indeed occur and will leave arc_need_free in an inappropriate state. i've also written a patch for arc_adjust_cb that adjusts arc_need_free in a way i believe is appropriate.

As far as i understand (and i would appreciate corrections), the relevant moving parts are:

  1. The kernel calls arc_shrinker_func with a struct shrink_control *sc, which can increment arcstat_need_free by ptob(nr_to_scan) -- how many bytes the kernel wants us to free.

  2. arc_need_free and arcstat_need_free are two names for the same uint64_t (from the #define and ARCSTAT macro in arc_impl.h:

#define    arc_need_free    ARCSTAT(arcstat_need_free) /* bytes to be freed */
  1. in arc.c, arc_reap_cb calls arc_reduce_target_size with the MAX of (arc_c >> arc_shrink_shift) - arc_available_memory() and arc_need_free:

    free_memory = arc_available_memory();
    
    int64_t to_free =
        (arc_c >> arc_shrink_shift) - free_memory;
    if (to_free > 0) {
    #ifdef _KERNEL
        to_free = MAX(to_free, arc_need_free);
    #endif
        arc_reduce_target_size(to_free);
  2. arc_reduce_target_size adjusts arc_c by its argument. If arc_need_free is big, it will dominate to_free and thus be given to arc_adjust_cb -- and will therefore determine the adjusted arc_c. Also the return value of arc_available_memory can be affected by the value of arc_need_free (i haven't looked into how exactly).

  3. arc_adjust() will look at arc_c and will actually do the necessary evictions to get the ARC size under that, and it return how many bytes were evicted.

  4. arc_adjust() is called by arc_adjust_cb. This is where i believe the problem lives:

    static void
    arc_adjust_cb(void *arg, zthr_t *zthr)
    {
    uint64_t evicted = 0;
    fstrans_cookie_t cookie = spl_fstrans_mark();
    
    /* Evict from cache */
    evicted = arc_adjust();
    
    /*
     * If evicted is zero, we couldn't evict anything
     * via arc_adjust(). This could be due to hash lock
     * collisions, but more likely due to the majority of
     * arc buffers being unevictable. Therefore, even if
     * arc_size is above arc_c, another pass is unlikely to
     * be helpful and could potentially cause us to enter an
     * infinite loop.  Additionally, zthr_iscancelled() is
     * checked here so that if the arc is shutting down, the
     * broadcast will wake any remaining arc adjust waiters.
     */
    mutex_enter(&arc_adjust_lock);
    arc_adjust_needed = !zthr_iscancelled(arc_adjust_zthr) &&
        evicted > 0 && aggsum_compare(&arc_size, arc_c) > 0;
    if (!arc_adjust_needed) {
        /*
         * We're either no longer overflowing, or we
         * can't evict anything more, so we should wake
         * arc_get_data_impl() sooner.
         */
        cv_broadcast(&arc_adjust_waiters_cv);
        arc_need_free = 0;
    }

Who can modify arc_need_free (aka arcstat_need_free)? arc_shrinker_func can only increase it, and the only way it can be decreased is via the "arc_need_free = 0" line in arc_adjust_cb. i believe that if arc_adjust() evicts N bytes of memory, that arc_need_free should be decremented by N (or set to 0 if arc_adjust evicted more than arc_need_free bytes). If arc_need_free is left unchanged at a nonzero value even through arc_adjust evicted nonzero bytes, we will be evicting more than necessary (double counting) the next time arc_adjust_cb runs.

arc_need_free can only decrease if it is zeroized -- which only happens if arc_adjust_needed is zero. However, it is possible for arc_adjust to evict nonzero bytes without arc_adjust_needed being changed! This is even the case if arc_adjust successfully evicted more bytes than arc_need_free!

To verify that this was happening, i patched arc_adjust_cb as follows (note that this does not affect any ZFS state -- this is noninvasive):

arc_adjust_cb(void *arg, zthr_t *zthr)
{
    uint64_t evicted = 0;
    int debt_adjusted = 0;
    int debt_cleared  = 0;
    uint64_t old_arc_need_free = 0;

    fstrans_cookie_t cookie = spl_fstrans_mark();

    /* Evict from cache */
    evicted = arc_adjust();
    zfs_dbgmsg("XXX: STG 1: PREV arcstat_need_free is %llu", arc_need_free);
    zfs_dbgmsg("XXX: STG 1:        arc_adjust evicted %llu", evicted);
    old_arc_need_free = arc_need_free;

    /*
     * If evicted is zero, we couldn't evict anything
     * via arc_adjust(). This could be due to hash lock
     * collisions, but more likely due to the majority of
     * arc buffers being unevictable. Therefore, even if
     * arc_size is above arc_c, another pass is unlikely to
     * be helpful and could potentially cause us to enter an
     * infinite loop.  Additionally, zthr_iscancelled() is
     * checked here so that if the arc is shutting down, the
     * broadcast will wake any remaining arc adjust waiters.
     */
    mutex_enter(&arc_adjust_lock);

    if (arc_need_free > evicted) {
        /* arc_need_free (aka kernel shrinker) threshold is extant, but not satisfied yet */
        debt_adjusted = 1;
        zfs_dbgmsg("XXX: STG 2: I WOULD ADJUST DEBT HERE");
    } else if (evicted > arc_need_free) { /* We satisfied the arc_need_free debt so we clear arc_need_free */
        debt_cleared = 1;
        zfs_dbgmsg("XXX: STG 3: I WOULD ANNUL THE DEBT HERE, ENTIRELY");
    }

    arc_adjust_needed = !zthr_iscancelled(arc_adjust_zthr) &&
        evicted > 0 && aggsum_compare(&arc_size, arc_c) > 0;
    if (!arc_adjust_needed) {
        /*
         * We're either no longer overflowing, or we
         * can't evict anything more, so we should wake
         * arc_get_data_impl() sooner.
         */
        cv_broadcast(&arc_adjust_waiters_cv);
        zfs_dbgmsg("XXX: STG 4: NEW  arcstat_need_free is ZERO since arc_adjust_needed == 0");

        arc_need_free = 0;
    } else {
        if ((debt_adjusted == 1) && (old_arc_need_free != 0)) {
            zfs_dbgmsg("XXXXXXXXXXXX OLD CODE WOULD HAVE NOT CREDITED THE EVICTION AMOUNT XXXXXXXXXXXX");
        }

        if ((debt_cleared == 1) && (old_arc_need_free != 0)) {
            zfs_dbgmsg("XXXXXXXXXXXX OLD CODE WOULD HAVE NOT ELIMINATED THE DEBT ALTOGETHER XXXXXXXXXXXX");
        }
    }
    mutex_exit(&arc_adjust_lock);
    spl_fstrans_unmark(cookie);
}

The cases i wanted to detect were "arc_need_free is nonzero at the start of arc_adjust_cb, arc_adjust() evicts more than arc_need_free bytes, and the existing ZFS code does not zero arc_need_free" and "arc_need_free is nonzero at the start of arc_adjust_cb, arc_adjust() evicts nonzero and less than arc_need_free bytes, and the existing ZFS code does not decrement arc_need_free"

After recompiling and rebooting with the patched ZFS module; i ran a workload that both fills the ARC (find . -print0|xargs -0 b2sum in a directory with hundreds of gigabytes of anime) and also provides memory pressure (stress --verbose -m 10 --vm-bytes 4000M -t 10 on a 32GB system) and looked at /proc/spl/kstat/zfs/dbgmsg:

1579644131   arc.c:4613:arc_adjust_cb(): XXX: STG 1: PREV arcstat_need_free is 8388608
1579644131   arc.c:4614:arc_adjust_cb(): XXX: STG 1:        arc_adjust evicted 120297984
1579644131   arc.c:4637:arc_adjust_cb(): XXX: STG 3: I WOULD ANNUL THE DEBT HERE, ENTIRELY
1579644131   arc.c:4658:arc_adjust_cb(): XXXXXXXXXXXX OLD CODE WOULD HAVE NOT ELIMINATED THE DEBT ALTOGETHER XXXXXXXXXXXX
1579644134   arc.c:4613:arc_adjust_cb(): XXX: STG 1: PREV arcstat_need_free is 47185920
1579644134   arc.c:4614:arc_adjust_cb(): XXX: STG 1:        arc_adjust evicted 12438662144
1579644134   arc.c:4637:arc_adjust_cb(): XXX: STG 3: I WOULD ANNUL THE DEBT HERE, ENTIRELY
1579644134   arc.c:4658:arc_adjust_cb(): XXXXXXXXXXXX OLD CODE WOULD HAVE NOT ELIMINATED THE DEBT ALTOGETHER XXXXXXXXXXXX
1579644134   arc.c:4613:arc_adjust_cb(): XXX: STG 1: PREV arcstat_need_free is 47185920
1579644134   arc.c:4614:arc_adjust_cb(): XXX: STG 1:        arc_adjust evicted 36614144
1579644134   arc.c:4634:arc_adjust_cb(): XXX: STG 2: I WOULD ADJUST DEBT HERE
1579644134   arc.c:4649:arc_adjust_cb(): XXX: STG 4: NEW  arcstat_need_free is ZERO since arc_adjust_needed == 0
1579644134   arc.c:4613:arc_adjust_cb(): XXX: STG 1: PREV arcstat_need_free is 0
1579644134   arc.c:4614:arc_adjust_cb(): XXX: STG 1:        arc_adjust evicted 41811968
1579644134   arc.c:4637:arc_adjust_cb(): XXX: STG 3: I WOULD ANNUL THE DEBT HERE, ENTIRELY
1579644134   arc.c:4649:arc_adjust_cb(): XXX: STG 4: NEW  arcstat_need_free is ZERO since arc_adjust_needed == 0
1579644134   arc.c:4613:arc_adjust_cb(): XXX: STG 1: PREV arcstat_need_free is 0
1579644134   arc.c:4614:arc_adjust_cb(): XXX: STG 1:        arc_adjust evicted 55570432
1579644134   arc.c:4637:arc_adjust_cb(): XXX: STG 3: I WOULD ANNUL THE DEBT HERE, ENTIRELY
1579644134   arc.c:4613:arc_adjust_cb(): XXX: STG 1: PREV arcstat_need_free is 0
1579644134   arc.c:4614:arc_adjust_cb(): XXX: STG 1:        arc_adjust evicted 524288
1579644134   arc.c:4637:arc_adjust_cb(): XXX: STG 3: I WOULD ANNUL THE DEBT HERE, ENTIRELY
1579644134   arc.c:4649:arc_adjust_cb(): XXX: STG 4: NEW  arcstat_need_free is ZERO since arc_adjust_needed == 0
1579644134   arc.c:4613:arc_adjust_cb(): XXX: STG 1: PREV arcstat_need_free is 0
1579644134   arc.c:4614:arc_adjust_cb(): XXX: STG 1:        arc_adjust evicted 44281856
1579644134   arc.c:4637:arc_adjust_cb(): XXX: STG 3: I WOULD ANNUL THE DEBT HERE, ENTIRELY

In the first four lines, we see that arc_adjust evicted 120297984 bytes -- far more than arcstat_need_free (8388608 bytes) and yet arcstat_need_free is not zeroed -- it's left to be incremented further (by a second kernel shrinker call)! arc_adjust_cb runs again, evicts 12438662144 bytes yet arcstat_need_free is left at 47185920, and it's only on the third call to arc_adjust_cb that arcstat_need_free is zeroed.

i believe that this does proper accounting for arcstat_need_free with regard to the bytes that arc_adjust evicts:

    if (arc_need_free > evicted) {
        /* arc_need_free (aka kernel shrinker) threshold is extant, but not satisfied yet */
        ARCSTAT_INCR(arcstat_need_free, -evicted); /* but we evicted SOMETHING, so we decrease our debt */
        debt_adjusted = 1;
        zfs_dbgmsg("XXX: STG 2: NEW  arcstat_need_free is %llu since arc_need_free > evicted", arc_need_free);
    } else if (evicted > arc_need_free) { /* We satisfied the arc_need_free debt so we clear arc_need_free */
        arc_need_free = 0;
        debt_adjusted = 1;
        zfs_dbgmsg("XXX: STG 3: NEW  arcstat_need_free is ZERO since evicted > arc_need_free");
    }

and this shows that this code is indeed hit and does adjust arcstat_need_free in cases where the original code does not touch it:

    if (!arc_adjust_needed) {
        /*
         * We're either no longer overflowing, or we
         * can't evict anything more, so we should wake
         * arc_get_data_impl() sooner.
         */
        cv_broadcast(&arc_adjust_waiters_cv);
        zfs_dbgmsg("XXX: STG 4: NEW  arcstat_need_free is ZERO since arc_adjust_needed == 0");

        arc_need_free = 0;
    } else {
        if (debt_adjusted == 1) {
            zfs_dbgmsg("XXXXXXXXXXXX OLD CODE WOULD HAVE NOT TOUCHED ARC_NEED_FREE XXXXXXXXXXXX");
        }
    }