Change size-based eviction to allow for significantly varying segment sizes

migrator commented 9 years ago

Presently size-based eviction divides maxWeight by the number of segments (concurrency level) and then each segment checks individually on writes for the need to evict entries.

This leads to under utilization of ram when the size of the data in each segment varies a lot. One can pick a larger maxWeight, of course, but then if the distribution of the sizes of the segments changes, one is in danger of ram really being overcommitted. One might also pick a better hash function, but it is not clear that will solve this consistently.

A better solution is to coordinate eviction across the segments so that the total cache more or less stays below the maxWeight, but allows segments to be larger when they are imbalanced. The additional need is that when one segment is below it's max and yet the cache as a whole is still over weight, one needs to poke at least one other segment to evict.

I have coded a solution to this in https://code.google.com/r/craigwi-guava/. The tests don't pass and so I'm not sure what to do about them. It would make sense to change the eviction unit test to account for this change, but that hasn't been done yet.

relevance: 6

migrator commented 9 years ago

summary: Not Defined

Under what sorts of circumstances would the relative size of the segments vary significantly? I'm having a hard time visualizing when this would happen other than a deliberate hash attack.

status Not Defined creator: lowas...@google.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

Guava is used in several places in ElasticSearch. In the case of the fielddata cache, the concurrency level is 16. Many of us have set the fielddata cache to use size-based eviction only to find that evictions are happening well before the cache is 'full'. In my case, I specified a 2GB cache and only 650mb was used when evictions started.

status Not Defined creator: crai...@microsoft.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

Ah, I see. So first things first: if each entry was on the order of say ~1 mb or less, this would point to a bad hash function (or hash attack).

It is, however, expected behavior when entries are much coarser than that. You really have 16 independent caches, which is of course where the high concurrency comes from.

So this FR makes sense, but I suspect it's not highly feasible given the way common.cache works now.

status Not Defined creator: kevinb@google.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

In case it wasn't clear, I already wrote the code and worked to ensure the independent (in the concurrency sense) of the segments. Let me know what you think after reviewing the code.

status Not Defined creator: crai...@microsoft.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

Part of the point of the way LocalCache is currently implemented is that operations on two different segments should not contend for any locks. From what I can tell, your approach seems to require a global AtomicLong updated on every operation on every segment.

The current approach was a deliberate tradeoff: accepting imprecise weight-based evictions for the sake of keeping segments as independent as possible. For more typical usages of Cache, this seems likely to be a significant performance loss as lots of small, cheap cache operations are forced to contend on a single AtomicLong.

status Not Defined creator: lowas...@google.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

The real solution here is to implement the eviction queue globally instead of per-segment. We have an internal issue about this (which needs to be made external), but we know how to do this and even wanted to do it in 2011 but didn't have a compelling case for justifying the change. This seems like a pretty compelling reason to do that right.

status Not Defined creator: fry@google.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined (No comment was entered for this change.)

status Not Defined creator: fry@google.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

Re using AtomicLong, if this is anything more than an interlocked increment or compare exchange at the hardware level, that definitely would be the wrong thing to do.

As for how often it is called, AtomicLong.addAndGet() is called in two places -- same two places that this.totalWeight is updated. These calls occur only for writes and removals which already take the per-segment lock and are relative heavy weight.

In particular, AtomicLong.addAndGet() is NOT called in the normal read case. There is one edge case I covered, but it is likely very rare and may not be worth coding for.

However, I might have missed some other problematic case and would be happy to know what I missed.

As for the "real solution" I agree and of course would be happy with any better solution than the one I have proposed.

status Not Defined creator: crai...@microsoft.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

As for how often it is called, AtomicLong.addAndGet() is called in two places -- same two places that this.totalWeight is updated. These calls occur only for writes and removals which already take the per-segment lock and are relative heavy weight.

My understanding is that taking the segment lock is one thing, but that many threads hammering on a single AtomicLong can be a serious performance issue. For example, Doug Lea's most recent ConcurrentHashMap implementation avoids AtomicLong for this reason, using some seriously magic alternatives. We actually have that class in Guava, under the name LongAdder, but I'm not sure we could use that and still preserve the invariant that the cache as a whole never goes over the limit.

status Not Defined creator: lowas...@google.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

The LongAdder is indeed interesting and inspired me to use a different approach for my requested feature.

The existing Segment.totalWeight values are like the Cell values in Doug Lea's approach. I changed my code to avoid the AtomicLong completely at the cost of a sum of the segments totalWeight values.

I pushed the new approach to https://code.google.com/r/craigwi-guava/.

Let me know what you think.

status Not Defined creator: crai...@microsoft.com created at: Sep 23, 2014

migrator commented 9 years ago

summary: Not Defined

I may be blind, but I can't see there any visibility guarantee for segment.totalWeight in evictEntriesFromLargestSegment.

Concerning the AtomicLong, you could reduce the contention by updating it only when currentSegment.totalWeight has changed significantly since the last update, i.e., let's say by more than 1/8 of the segment capacity. This way you'd maintain an approximation of the total weight within this precision and you could ensure that no eviction takes place before the cache reaches 7/8 (instead of 1/segments.length) of its capacity.

For normally behaving caches, there'd no contention at all as all segments would reach and stay at about their maximum size and never update the AtomicLong.

status Not Defined creator: Maaarti...@gmail.com created at: Sep 24, 2014

migrator commented 9 years ago

summary: Not Defined

That's another good idea to reduce the potential cost / contention of using AtomicLong.

My most recent version avoids AtomicLong altogether -- summing up the total segment weights as needed during eviction.

What are the next steps here? Do you have perf tests against which we can measure this change?

status Not Defined creator: crai...@microsoft.com created at: Sep 29, 2014

migrator commented 9 years ago

summary: Not Defined

It was mentioned above and in another forum that perhaps a better hash function would solve this. The issue stems from the combination of the hash function distribution AND the distribution of the sizes of the elements -- the later of which cannot be encoded in the hash function. With a "perfect" hash function, each cache segment would have an equal number of (and let's say equal aged) entries. If the entries vary in size then the cache will still be out of balance (size-wise) and likely never utilize the specified cache fully. The more the size varies, the more the cache won't utilized RAM fully.

status Not Defined creator: crai...@microsoft.com created at: Sep 30, 2014

migrator commented 9 years ago

summary: Not Defined

Any update on this? Would like to get an idea of where this is heading in order to take the appropriate action in ES (see https://github.com/elasticsearch/elasticsearch/issues/7836).

status Not Defined creator: greg.mar...@elasticsearch.com created at: Oct 8, 2014

migrator commented 9 years ago

summary: Not Defined

I think my concrete issue with the current version of the change is that it's back to allowing races: totalSegmentView() can have an inconsistent view of the size of the different segments. You'd encounter this issue with LongAdder, too, I believe.

The current solution manages to guarantee that the weight cap is never exceeded even in a multithreaded context, and keeps each operation local to the specific segment it needs to access, without contending with operations on any other segment. In exchange, it works poorly for a very narrow range of use cases like this, where the size of individual cache entries is a large fraction of the total weight cap, relative to the concurrencyLevel.

To be honest, I can't see any way of maintaining these invariants while accomplishing something like your goal here. If we come up with a way to maintain these invariants while increasing cache utilization, we can come back to this, but I don't think this is likely to go anywhere in the near term.

status Not Defined creator: lowas...@google.com created at: Oct 13, 2014

migrator commented 9 years ago

summary: Not Defined

The current solution, in fact, does NOT guarantee that the weight cap is never exceeded. On cache miss the following high level steps happen in this order:

load new data and store into segment (cf. code called from loadSync())
then check to see if segment is over full (while (totalWeight > maxSegmentWeight) { } ) (this is in eviceEntries())

Which means that every segment can be at 100% full and the very next load on any segment will cause the memory usage to exceed the segment weight and thus the whole cache weight. If this happened all at the same time for many/all segments, each of those segments could go over at the same time and the cache would be over weight by the size of the data just loaded.

I believe that my proposal will result in a cache level over commit amount similar to the current code (not identical, but similar).

Of course, the problem I'm trying to solve is the under commit amount. My proposal dramatically improves the behavior on the under commit side.

status Not Defined creator: crai...@microsoft.com created at: Oct 13, 2014

migrator commented 9 years ago

Past Events: action:mentioned user: fry@google.com at: null referenced issue: null

migrator / guava-libraries-3

Change size-based eviction to allow for significantly varying segment sizes #2