openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.45k stars 1.73k forks source link

ARC efficiency regression #5418

Closed haasn closed 7 years ago

haasn commented 7 years ago

I've noticed this ever since my previous reboot (on the 2nd november), and it hasn't gone away since. In the past, my ARC efficiency was always floating at around 90%, especially after a few days of uptime. Since the 2nd november (where I went from zfs-7.0-rc1 to zfs-7.0-rc2), this has gone down considerably.

Alongside the ZFS version upgrade I also did a kernel upgrade (4.8.0 -> 4.8.5), but I'm not sure if that would have caused this or not. These are my stats currently:

λ uname -a
Linux xor 4.8.8-gentoo #2 SMP PREEMPT Thu Nov 17 05:34:00 CET 2016 x86_64 Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz GenuineIntel Linux

λ uptime
 02:31:09 up 3 days,  6:11,  1 user,  load average: 1.55, 0.74, 0.47

λ arc_summary.py 

------------------------------------------------------------------------
ZFS Subsystem Report                Mon Nov 21 02:31:10 2016
ARC Summary: (HEALTHY)
    Memory Throttle Count:          0

ARC Misc:
    Deleted:                25.42m
    Mutex Misses:               2.15k
    Evict Skips:                2.15k

ARC Size:               25.56%  8.04    GiB
    Target Size: (Adaptive)     25.73%  8.09    GiB
    Min Size (Hard Limit):      0.10%   32.00   MiB
    Max Size (High Water):      1006:1  31.44   GiB

ARC Size Breakdown:
    Recently Used Cache Size:   12.70%  1.03    GiB
    Frequently Used Cache Size: 87.30%  7.06    GiB

ARC Hash Breakdown:
    Elements Max:               6.82m
    Elements Current:       95.20%  6.49m
    Collisions:             22.24m
    Chain Max:              9
    Chains:                 1.53m

ARC Total accesses:                 115.20m
    Cache Hit Ratio:        38.76%  44.66m
    Cache Miss Ratio:       61.24%  70.54m
    Actual Hit Ratio:       36.16%  41.65m

    Data Demand Efficiency:     70.09%  27.79m
    Data Prefetch Efficiency:   70.92%  23.70m

    CACHE HITS BY CACHE LIST:
      Most Recently Used:       8.48%   3.79m
      Most Frequently Used:     84.79%  37.87m
      Most Recently Used Ghost: 3.50%   1.56m
      Most Frequently Used Ghost:   4.55%   2.03m

    CACHE HITS BY DATA TYPE:
      Demand Data:          43.61%  19.47m
      Prefetch Data:        37.63%  16.81m
      Demand Metadata:      12.06%  5.39m
      Prefetch Metadata:        6.70%   2.99m

    CACHE MISSES BY DATA TYPE:
      Demand Data:          11.78%  8.31m
      Prefetch Data:        9.77%   6.89m
      Demand Metadata:      77.39%  54.59m
      Prefetch Metadata:        1.06%   748.00k

L2 ARC Summary: (HEALTHY)
    Low Memory Aborts:          5
    Free on Write:              4.56k
    R/W Clashes:                0
    Bad Checksums:              0
    IO Errors:              0

L2 ARC Size: (Adaptive)             99.11   GiB
    Compressed:         92.48%  91.65   GiB
    Header Size:            0.51%   513.12  MiB

L2 ARC Evicts:
    Lock Retries:               14
    Upon Reading:               1

L2 ARC Breakdown:               70.54m
    Hit Ratio:          8.29%   5.84m
    Miss Ratio:         91.71%  64.70m
    Feeds:                  287.54k

L2 ARC Writes:
    Writes Sent:            100.00% 177.81k

DMU Prefetch Efficiency:                    722.63m
    Hit Ratio:          5.56%   40.18m
    Miss Ratio:         94.44%  682.45m

ZFS Tunable:
    l2arc_headroom                                    2
    dbuf_cache_max_shift                              5
    zfs_free_leak_on_eio                              0
    zfs_free_max_blocks                               100000
    zfs_read_chunk_size                               1048576
    metaslab_preload_enabled                          1
    zfs_dedup_prefetch                                0
    zfs_txg_history                                   0
    zfs_scrub_delay                                   4
    zfs_vdev_async_read_max_active                    3
    zfs_read_history                                  0
    zfs_arc_sys_free                                  0
    l2arc_write_max                                   8388608
    zfs_dbuf_state_index                              0
    metaslab_debug_unload                             0
    zvol_inhibit_dev                                  0
    zfetch_max_streams                                8
    zfs_recover                                       0
    metaslab_fragmentation_factor_enabled             1
    zfs_sync_pass_rewrite                             2
    zfs_object_mutex_size                             64
    zfs_arc_min_prefetch_lifespan                     0
    zfs_arc_meta_prune                                10000
    zfs_read_history_hits                             0
    zfetch_max_distance                               8388608
    l2arc_norw                                        0
    zfs_dirty_data_max_percent                        10
    zfs_arc_meta_min                                  0
    metaslabs_per_vdev                                200
    zfs_arc_meta_adjust_restarts                      4096
    zil_slog_limit                                    1048576
    spa_load_verify_maxinflight                       10000
    spa_load_verify_metadata                          1
    zfs_send_corrupt_data                             0
    zfs_delay_min_dirty_percent                       60
    zfs_vdev_sync_read_max_active                     10
    zfs_dbgmsg_enable                                 0
    zio_requeue_io_start_cut_in_line                  1
    l2arc_headroom_boost                              200
    zfs_zevent_cols                                   80
    spa_config_path                                   /etc/zfs/zpool.cache
    zfs_vdev_cache_size                               0
    dbuf_cache_hiwater_pct                            10
    zio_dva_throttle_enabled                          0
    zfs_vdev_sync_write_min_active                    10
    zfs_vdev_scrub_max_active                         2
    ignore_hole_birth                                 1
    zvol_major                                        230
    zil_replay_disable                                0
    zfs_dirty_data_max_max_percent                    25
    zfs_expire_snapshot                               300
    zfs_sync_pass_deferred_free                       2
    spa_asize_inflation                               24
    zfs_vdev_mirror_rotating_seek_offset              1048576
    l2arc_feed_secs                                   1
    zfs_autoimport_disable                            1
    zfs_arc_p_aggressive_disable                      1
    zfs_zevent_len_max                                512
    zfs_arc_meta_limit_percent                        75
    l2arc_noprefetch                                  1
    zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3
    zfs_arc_meta_limit                                0
    zfs_flags                                         0
    zfs_dirty_data_max_max                            16880430080
    zfs_arc_average_blocksize                         8192
    zfs_vdev_cache_bshift                             16
    zfs_vdev_async_read_min_active                    1
    zfs_arc_dnode_reduce_percent                      10
    zfs_free_bpobj_enabled                            1
    zfs_arc_num_sublists_per_state                    32
    zfs_arc_grow_retry                                0
    zfs_vdev_mirror_rotating_inc                      0
    l2arc_feed_again                                  1
    zfs_vdev_mirror_non_rotating_inc                  0
    zfs_arc_lotsfree_percent                          10
    zfs_zevent_console                                0
    zvol_prefetch_bytes                               131072
    zfs_free_min_time_ms                              1000
    zfs_arc_dnode_limit_percent                       10
    zio_taskq_batch_pct                               75
    dbuf_cache_max_bytes                              104857600
    spa_load_verify_data                              1
    zfs_delete_blocks                                 20480
    zfs_vdev_mirror_non_rotating_seek_inc             1
    zfs_dirty_data_max                                6752172032
    zfs_vdev_async_write_max_active                   10
    zfs_dbgmsg_maxsize                                4194304
    zfs_nocacheflush                                  0
    zfetch_array_rd_sz                                1048576
    zfs_arc_meta_strategy                             1
    zfs_dirty_data_sync                               67108864
    zvol_max_discard_blocks                           16384
    zfs_vdev_async_write_active_max_dirty_percent     60
    zfs_arc_p_dampener_disable                        1
    zfs_txg_timeout                                   5
    metaslab_aliquot                                  524288
    zfs_mdcomp_disable                                0
    zfs_vdev_sync_read_min_active                     10
    zfs_arc_dnode_limit                               0
    dbuf_cache_lowater_pct                            10
    metaslab_debug_load                               0
    zfs_vdev_aggregation_limit                        131072
    metaslab_lba_weighting_enabled                    1
    zfs_vdev_scheduler                                noop
    zfs_vdev_scrub_min_active                         1
    zfs_no_scrub_io                                   0
    zfs_vdev_cache_max                                16384
    zfs_scan_idle                                     50
    zfs_arc_shrink_shift                              0
    spa_slop_shift                                    5
    zfs_vdev_mirror_rotating_seek_inc                 5
    zfs_deadman_synctime_ms                           1000000
    send_holes_without_birth_time                     1
    metaslab_bias_enabled                             1
    zfs_admin_snapshot                                1
    zfs_no_scrub_prefetch                             0
    zfs_metaslab_fragmentation_threshold              70
    zfs_max_recordsize                                1048576
    zfs_arc_min                                       0
    zfs_nopwrite_enabled                              1
    zfs_arc_p_min_shift                               0
    zfs_vdev_queue_depth_pct                          1000
    zfs_mg_fragmentation_threshold                    85
    l2arc_write_boost                                 8388608
    zfs_prefetch_disable                              0
    l2arc_feed_min_ms                                 200
    zio_delay_max                                     30000
    zfs_vdev_write_gap_limit                          4096
    zfs_pd_bytes_max                                  52428800
    zfs_scan_min_time_ms                              1000
    zfs_resilver_min_time_ms                          3000
    zfs_delay_scale                                   500000
    zfs_vdev_async_write_active_min_dirty_percent     30
    zfs_vdev_sync_write_max_active                    10
    zfs_mg_noalloc_threshold                          0
    zfs_deadman_enabled                               1
    zfs_resilver_delay                                2
    zfs_arc_max                                       0
    zfs_top_maxinflight                               32
    zfetch_min_sec_reap                               2
    zfs_immediate_write_sz                            32768
    zfs_vdev_async_write_min_active                   1
    zfs_sync_pass_dont_compress                       5
    zfs_vdev_read_gap_limit                           32768
    zfs_compressed_arc_enabled                        1
    zfs_vdev_max_active                               1000

Something I noticed is that my ARC never seems to want to stay full. Right now my total RAM usage is at 28%, so memory pressure could hardly be the culprit, yet my ARC keeps shrinking in size.

I noticed that particular phenomenon actually happening way before the 2nd november, having commenting about it on the 19th of October at least, when my ARC efficiency was still great:

2016-10-19 19:43:15 haasn   Hmm. Why does my ARC keep shrinking recently? It keeps going down to 4 GB even though I have plenty of free space, and the FS also seems pretty slow
2016-10-19 19:43:18 haasn   It's almost as if it stopped working
2016-10-19 19:43:22 haasn   But I don't remember changing anything

Here is a visual representation of my ARC efficiency over time: (please excuse the long region of no data collection, I had munin disabled for a while)

munin arc efficiency graph

You can see quite clearly how the efficiency absolutely tanks on the 2nd november. normally, it was back up to full efficiency not long after a reboot, especially since I run a script after reboot to force most of my data into L1/L2arc. (My previous reboot, on the 15th of october, doesn't even register in the munin graph on those timescales anymore)

You can also see this weird oscillatory behavior in the close-up data. Maybe this is the ARC growing and shrinking?

Something else I noticed: Most of my “unused” RAM was being used as a cache for a non-ZFS filesystem that I added a few days ago. (Previously, I had no large non-ZFS filesystems, and certainly not on the 2nd of november)

munin memory usage graph

I'm not sure if this is related or not, but I noticed my cache usage increasing as my ARC size dropped as early as the 5th of september:

2016-09-05 08:38:56 haasn   I'm a bit confused by this behavior: On my system, I have almost all of my available RAM being used up by “cache” (i.e. regular linux caching layer), to the point where ARC starts shrinking due to all of the memory pressure
2016-09-05 08:39:06 haasn   This also dramatically increases my memory fragmentation
2016-09-05 08:39:14 haasn   Here's the thing: I don't have any physical filesystems mounted other than zfs
2016-09-05 08:39:20 haasn   And zfs should not use the linux built-in cache layer at all, right?
2016-09-05 08:39:50 haasn   I wouldn't worry much about it if it wasn't for the fact that 1. ZFS ARC starts shrinking due to it, 2. it doesn't make any sense

That said, I'm still not sure what the relationship between the two is, i.e. if cache usage is simply going up because ARC is shrinking (or vice versa).

I wasn't sure whether this was worth reporting, on the advent of ABD, but apparently (?) ABD testers noticed similar ARC efficiency drops, which might indicate a problem elsewhere in the code.

dweeezil commented 7 years ago

If you use compression, I suspect the difference you're seeing is the result of the compressed ARC (d3c2ae1). Of course, the compressed ARC should actually increase the hit rate when the compression ratio is high.

haasn commented 7 years ago

I am using compression (lz4). Is there any tunable I could play with to confirm that suspicion?

ironMann commented 7 years ago

@dweeezil I remember seeing a potential integer overflow in __arc_shrinker_func() while going through some Coverity reports (CID 147545). It might not be the cause of this issue but it should be looked at.

lnicola commented 7 years ago

It's probably not related, but after moving from release to -git I noticed that the prefetch efficiency also dropped on my box. I suspect this is due to the removal of strided prefetch, which seemed to do a great job for me (not sure why).

tuxoko commented 7 years ago

@dweeezil Compressed ARC will greatly increase the memory fragmentation. Using compress=on without ABD is really bad for performance.

dweeezil commented 7 years ago

@haasn The (currently undocumented, it would appear) zfs_compressed_arc_enabled can be set to 0 to disable compressed ARC. As @tuxoko pointed out, it will increase memory fragmentation a lot but it's unclear as to whether that's the cause of your decreased hit rate.

sempervictus commented 7 years ago

@haasn: could you please rerun your test set with the current master? ABD is now merged. There are also references to a couple of additional performance enhancements for ABD in #5135 such as local LRU and sort/merge additions which i'm running through initial testing right now, but ABD itself should significantly help if compressed ARC is your bottleneck.

haasn commented 7 years ago

Testing ABD since sometime yesterday. my munin graph seems not to have changed much, but I'll give it some more time.

My ARC is still shrinking when not in use, but the efficiency as reported by arc_summary seems to have improved (at least so far), currently it's at 81% cache hit ratio. Less than it was in the past (despite having significantly more RAM than back then).

I'll give it some more time, I suppose.

haasn commented 7 years ago

No change in observed behavior with ABD. Although I'm still not sure if the graphing is simply reporting compressed ARC stats incorrectly.

As far as I can tell, it calculates mfu_hits / arc_access_total (ditto for mru).

haasn commented 7 years ago

Btw, those spikes are my workload. (I changed the frequency and observed the spike frequency changing as well)

Seems like the efficiency is always reported as high while my workload is high, and low while it's not. I'm not sure whether the ARC stats are just getting confused by the fact that the load is low, or whether ARC efficiency is actually poorerer while not under load now.