Change size-based eviction to allow for significantly varying segment sizes

GoogleCodeExporter commented 9 years ago

Presently size-based eviction divides maxWeight by the number of segments 
(concurrency level) and then each segment checks individually on writes for the 
need to evict entries.  

This leads to under utilization of ram when the size of the data in each 
segment varies a lot.  One can pick a larger maxWeight, of course, but then if 
the distribution of the sizes of the segments changes, one is in danger of ram 
really being overcommitted.  One might also pick a better hash function, but it 
is not clear that will solve this consistently.

A better solution is to coordinate eviction across the segments so that the 
total cache more or less stays below the maxWeight, but allows segments to be 
larger when they are imbalanced.  The additional need is that when one segment 
is below it's max and yet the cache as a whole is still over weight, one needs 
to poke at least one other segment to evict. 

I have coded a solution to this in https://code.google.com/r/craigwi-guava/.  
The tests don't pass and so I'm not sure what to do about them.  It would make 
sense to change the eviction unit test to account for this change, but that 
hasn't been done yet.

Original issue reported on code.google.com by crai...@microsoft.com on 23 Sep 2014 at 4:57

GoogleCodeExporter commented 9 years ago

Under what sorts of circumstances would the relative size of the segments vary 
significantly?  I'm having a hard time visualizing when this would happen other 
than a deliberate hash attack.

Original comment by lowas...@google.com on 23 Sep 2014 at 4:58

Added labels: Package-Cache, Type-Performance

GoogleCodeExporter commented 9 years ago

Guava is used in several places in ElasticSearch.  In the case of the fielddata 
cache, the concurrency level is 16.  Many of us have set the fielddata cache to 
use size-based eviction only to find that evictions are happening well before 
the cache is 'full'.  In my case, I specified a 2GB cache and only 650mb was 
used when evictions started.

You can read more about these cases here: 
https://groups.google.com/forum/#!msg/elasticsearch/HZj2AR16oss/812kgZb8FhMJ 
and 
https://groups.google.com/forum/#!msg/elasticsearch/42qrpYRJvsU/IPArEH8KSYQJ.

Original comment by crai...@microsoft.com on 23 Sep 2014 at 6:15

GoogleCodeExporter commented 9 years ago

Ah, I see. So first things first: if each entry was on the order of say ~1 mb 
or less, this would point to a bad hash function (or hash attack).

It is, however, expected behavior when entries are much coarser than that. You 
really have 16 independent caches, which is of course where the high 
concurrency comes from.

So this FR makes sense, but I suspect it's not highly feasible given the way 
common.cache works now.

Original comment by kevinb@google.com on 23 Sep 2014 at 7:02

GoogleCodeExporter commented 9 years ago

In case it wasn't clear, I already wrote the code and worked to ensure the 
independent (in the concurrency sense) of the segments.  Let me know what you 
think after reviewing the code.

Original comment by crai...@microsoft.com on 23 Sep 2014 at 7:07

GoogleCodeExporter commented 9 years ago

Part of the point of the way LocalCache is currently implemented is that 
operations on two different segments should not contend for any locks.  From 
what I can tell, your approach seems to require a global AtomicLong updated on 
every operation on every segment.

The current approach was a deliberate tradeoff: accepting imprecise 
weight-based evictions for the sake of keeping segments as independent as 
possible.  For more typical usages of Cache, this seems likely to be a 
significant performance loss as lots of small, cheap cache operations are 
forced to contend on a single AtomicLong.

Original comment by lowas...@google.com on 23 Sep 2014 at 7:16

GoogleCodeExporter commented 9 years ago

The real solution here is to implement the eviction queue globally instead of 
per-segment. We have an internal issue about this (which needs to be made 
external), but we know how to do this and even wanted to do it in 2011 but 
didn't have a compelling case for justifying the change. This seems like a 
pretty compelling reason to do that right.

Original comment by fry@google.com on 23 Sep 2014 at 7:25

GoogleCodeExporter commented 9 years ago

Original comment by fry@google.com on 23 Sep 2014 at 7:35

GoogleCodeExporter commented 9 years ago

Re using AtomicLong, if this is anything more than an interlocked increment or 
compare exchange at the hardware level, that definitely would be the wrong 
thing to do.  

As for how often it is called, AtomicLong.addAndGet() is called in two places 
-- same two places that this.totalWeight is updated.  These calls occur only 
for writes and removals which already take the per-segment lock and are 
relative heavy weight.

In particular, AtomicLong.addAndGet() is NOT called in the normal read case.  
There is one edge case I covered, but it is likely very rare and may not be 
worth coding for.

However, I might have missed some other problematic case and would be happy to 
know what I missed.

As for the "real solution" I agree and of course would be happy with any better 
solution than the one I have proposed.

Original comment by crai...@microsoft.com on 23 Sep 2014 at 7:49

GoogleCodeExporter commented 9 years ago

> As for how often it is called, AtomicLong.addAndGet() is called in two places 
-- same two places that this.totalWeight is updated.  These calls occur only 
for writes and removals which already take the per-segment lock and are 
relative heavy weight.

My understanding is that taking the segment lock is one thing, but that many 
threads hammering on a single AtomicLong can be a serious performance issue.  
For example, Doug Lea's most recent ConcurrentHashMap implementation avoids 
AtomicLong for this reason, using some seriously magic alternatives.  We 
actually have that class in Guava, under the name LongAdder, but I'm not sure 
we could use that and still preserve the invariant that the cache as a whole 
never goes over the limit.

Original comment by lowas...@google.com on 23 Sep 2014 at 7:59

GoogleCodeExporter commented 9 years ago

The LongAdder is indeed interesting and inspired me to use a different approach 
for my requested feature.  

The existing Segment.totalWeight values are like the Cell values in Doug Lea's 
approach.  I changed my code to avoid the AtomicLong completely at the cost of 
a sum of the segments totalWeight values.  

I pushed the new approach to https://code.google.com/r/craigwi-guava/.

Let me know what you think.

Original comment by crai...@microsoft.com on 24 Sep 2014 at 2:57

GoogleCodeExporter commented 9 years ago

I may be blind, but I can't see there any visibility guarantee for 
segment.totalWeight in evictEntriesFromLargestSegment.

Concerning the AtomicLong, you could reduce the contention by updating it only 
when currentSegment.totalWeight has changed significantly since the last 
update, i.e., let's say by more than 1/8 of the segment capacity. This way 
you'd maintain an approximation of the total weight within this precision and 
you could ensure that no eviction takes place before the cache reaches 7/8 
(instead of 1/segments.length) of its capacity.

For normally behaving caches, there'd no contention at all as all segments 
would reach and stay at about their maximum size and never update the 
AtomicLong.

Original comment by Maaarti...@gmail.com on 25 Sep 2014 at 5:06

GoogleCodeExporter commented 9 years ago

That's another good idea to reduce the potential cost / contention of using 
AtomicLong.   

My most recent version avoids AtomicLong altogether -- summing up the total 
segment weights as needed during eviction.

What are the next steps here?  Do you have perf tests against which we can 
measure this change?

Original comment by crai...@microsoft.com on 29 Sep 2014 at 3:07

GoogleCodeExporter commented 9 years ago

It was mentioned above and in another forum that perhaps a better hash function 
would solve this.  The issue stems from the combination of the hash function 
distribution AND the distribution of the sizes of the elements -- the later of 
which cannot be encoded in the hash function.  With a "perfect" hash function, 
each cache segment would have an equal number of (and let's say equal aged) 
entries.  If the entries vary in size then the cache will still be out of 
balance (size-wise) and likely never utilize the specified cache fully.  The 
more the size varies, the more the cache won't utilized RAM fully.

Original comment by crai...@microsoft.com on 30 Sep 2014 at 4:32

GoogleCodeExporter commented 9 years ago

Any update on this?  Would like to get an idea of where this is heading in 
order to take the appropriate action in ES (see 
https://github.com/elasticsearch/elasticsearch/issues/7836).

Original comment by greg.mar...@elasticsearch.com on 8 Oct 2014 at 8:18

GoogleCodeExporter commented 9 years ago

I think my concrete issue with the current version of the change is that it's 
back to allowing races: totalSegmentView() can have an inconsistent view of the 
size of the different segments.  You'd encounter this issue with LongAdder, 
too, I believe.

The current solution manages to *guarantee* that the weight cap is never 
exceeded even in a multithreaded context, and keeps each operation local to the 
specific segment it needs to access, without contending with operations on any 
other segment.  In exchange, it works poorly for a very narrow range of use 
cases like this, where the size of individual cache entries is a large fraction 
of the total weight cap, relative to the concurrencyLevel.

To be honest, I can't see any way of maintaining these invariants while 
accomplishing something like your goal here.  If we come up with a way to 
maintain these invariants while increasing cache utilization, we can come back 
to this, but I don't think this is likely to go anywhere in the near term.

Original comment by lowas...@google.com on 13 Oct 2014 at 9:30

GoogleCodeExporter commented 9 years ago

The current solution, in fact, does NOT guarantee that the weight cap is never 
exceeded.  On cache miss the following high level steps happen in this order:

1. load new data and store into segment 
    (cf. code called from loadSync())
2. *then* check to see if segment is *over* full (while (totalWeight > 
maxSegmentWeight) { <do evict> } )
    (this is in eviceEntries())

Which means that every segment can be at 100% full and the very next load on 
any segment will cause the memory usage to exceed the segment weight and thus 
the whole cache weight.  If this happened all at the same time for many/all 
segments, each of those segments could go over at the same time and the cache 
would be over weight by the size of the data just loaded.

I believe that my proposal will result in a cache level over commit amount 
similar to the current code (not identical, but similar).  

Of course, the problem I'm trying to solve is the under commit amount.  My 
proposal dramatically improves the behavior on the under commit side.

Original comment by crai...@microsoft.com on 14 Oct 2014 at 12:13

GoogleCodeExporter commented 9 years ago

This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:08

Added labels: MigratedToGitHub

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:17

Changed state: Migrated

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:07

Added labels: Restrict-AddIssueComment-Commit

tailorlala / guava-libraries

Change size-based eviction to allow for significantly varying segment sizes #1854