Closed haasn closed 7 years ago
If you use compression, I suspect the difference you're seeing is the result of the compressed ARC (d3c2ae1). Of course, the compressed ARC should actually increase the hit rate when the compression ratio is high.
I am using compression (lz4
). Is there any tunable I could play with to confirm that suspicion?
@dweeezil I remember seeing a potential integer overflow in __arc_shrinker_func() while going through some Coverity reports (CID 147545). It might not be the cause of this issue but it should be looked at.
It's probably not related, but after moving from release to -git
I noticed that the prefetch efficiency also dropped on my box. I suspect this is due to the removal of strided prefetch, which seemed to do a great job for me (not sure why).
@dweeezil Compressed ARC will greatly increase the memory fragmentation. Using compress=on without ABD is really bad for performance.
@haasn The (currently undocumented, it would appear) zfs_compressed_arc_enabled
can be set to 0 to disable compressed ARC. As @tuxoko pointed out, it will increase memory fragmentation a lot but it's unclear as to whether that's the cause of your decreased hit rate.
@haasn: could you please rerun your test set with the current master? ABD is now merged. There are also references to a couple of additional performance enhancements for ABD in #5135 such as local LRU and sort/merge additions which i'm running through initial testing right now, but ABD itself should significantly help if compressed ARC is your bottleneck.
Testing ABD since sometime yesterday. my munin graph seems not to have changed much, but I'll give it some more time.
My ARC is still shrinking when not in use, but the efficiency as reported by arc_summary
seems to have improved (at least so far), currently it's at 81% cache hit ratio. Less than it was in the past (despite having significantly more RAM than back then).
I'll give it some more time, I suppose.
No change in observed behavior with ABD. Although I'm still not sure if the graphing is simply reporting compressed ARC stats incorrectly.
As far as I can tell, it calculates mfu_hits / arc_access_total
(ditto for mru
).
Btw, those spikes are my workload. (I changed the frequency and observed the spike frequency changing as well)
Seems like the efficiency is always reported as high while my workload is high, and low while it's not. I'm not sure whether the ARC stats are just getting confused by the fact that the load is low, or whether ARC efficiency is actually poorerer while not under load now.
I've noticed this ever since my previous reboot (on the 2nd november), and it hasn't gone away since. In the past, my ARC efficiency was always floating at around 90%, especially after a few days of uptime. Since the 2nd november (where I went from zfs-7.0-rc1 to zfs-7.0-rc2), this has gone down considerably.
Alongside the ZFS version upgrade I also did a kernel upgrade (4.8.0 -> 4.8.5), but I'm not sure if that would have caused this or not. These are my stats currently:
Something I noticed is that my ARC never seems to want to stay full. Right now my total RAM usage is at 28%, so memory pressure could hardly be the culprit, yet my ARC keeps shrinking in size.
I noticed that particular phenomenon actually happening way before the 2nd november, having commenting about it on the 19th of October at least, when my ARC efficiency was still great:
Here is a visual representation of my ARC efficiency over time: (please excuse the long region of no data collection, I had munin disabled for a while)
You can see quite clearly how the efficiency absolutely tanks on the 2nd november. normally, it was back up to full efficiency not long after a reboot, especially since I run a script after reboot to force most of my data into L1/L2arc. (My previous reboot, on the 15th of october, doesn't even register in the munin graph on those timescales anymore)
You can also see this weird oscillatory behavior in the close-up data. Maybe this is the ARC growing and shrinking?
Something else I noticed: Most of my “unused” RAM was being used as a cache for a non-ZFS filesystem that I added a few days ago. (Previously, I had no large non-ZFS filesystems, and certainly not on the 2nd of november)
I'm not sure if this is related or not, but I noticed my cache usage increasing as my ARC size dropped as early as the 5th of september:
That said, I'm still not sure what the relationship between the two is, i.e. if cache usage is simply going up because ARC is shrinking (or vice versa).
I wasn't sure whether this was worth reporting, on the advent of ABD, but apparently (?) ABD testers noticed similar ARC efficiency drops, which might indicate a problem elsewhere in the code.