Evict dead nodes first in LRU Eviction Policy

GoogleCodeExporter commented 9 years ago

Current Behavior:
At the moment onGet() in EvictionPolicy.LRU() copies the node and inserts
the new node on to the tail of the list (offer). The old node's value is
set to null and it remains at its position. The dead node therefore still
consumes space until it is evicted. I know this by design since you have
chosen a lazy removal strategy.

Desired Behavior:
My idea is to additionally always move the dead nodes to the beginning of
the list. Then they would be evicted first and the user-defined capacity is
unaffected by them. Otherwise, an active node might be evicted because dead
nodes are consuming space which leads to a wrong capacity from the
perspective of the user of the map.

Complexity / Amount of Change Required:
If desired, I can come up with an implementation of this.

Original issue reported on code.google.com by christia...@gmail.com on 22 Jan 2009 at 1:46

GoogleCodeExporter commented 9 years ago

You may be too kind in regards to believing that it was "by design". Originally 
I 
had intended to use direct removals rather than lazy ones, and had implemented 
a 
backtracking algorithm to do this. This passed my single-threaded unit tests, 
but I 
wasn't comfortable enough adopting it in production and removed it before 
validating 
under multiple threads. The simpler alternative was adopted to reduce 
complexity and 
have a solid base. The last version using backtracking is attached.

I have been meaning to get back and fix this, among other minor enhancements, 
but 
was a little too burned out from other activities. While LRU isn't the best 
algorithm under concurrency, and I prefer the Second Chance policy, it is the 
standard that everyone expects. So I agree it needs to be supported. I was 
considering making the linkage strategy a setting, because I would expect that 
my 
doubly-linked list would be slower than Doug Lea's queue (which is acceptable 
for 
FIFO policies). You can see some thrashing on the write benchmarks, likely 
needing a 
bit of a busy wait to back off rather than pound the system bus.

In regards to your idea - moving a node requires removal. Once its removed, 
there is 
no reason to add it back. However, removal can be tricky and the head/tail 
manipulations are greatly simplified due using sentinal nodes. In addition to 
the 
backtracking strategy, I had worked out the details for a 3-node locking 
approach 
which seemed fairly ugly, but doable. If you want to venture down either of 
these 
paths, or have another idea entirely, feel free to give it a shot!

Let me know if you want any help (code/brainstorming/etc), and commit access is 
available if desired. Feel free to email me directly, continue on the issue, or 
group.

Original comment by Ben.Manes@gmail.com on 23 Jan 2009 at 9:27

Attachments:

ConcurrentLinkedHashMap-backtracking.java

GoogleCodeExporter commented 9 years ago

link to discussion thread
http://groups.google.com/group/concurrentlinkedhashmap/browse_thread/thread/1b65
01f0dede6ff6

Original comment by christia...@gmail.com on 24 Jan 2009 at 10:53

GoogleCodeExporter commented 9 years ago

Proposed Solution #1
---------------------
Drop double-links and use single-link list. Store as the map's value {prev, 
curr} 
nodes. This implies that to remove from the list, the prev->next reference can 
be 
updated to curr->next. No back-pointers are required.

The map's corresponding entry where the curr node is the value's prev must be 
updated. This can be done via a hash lookup.

| k | prev | curr |
| ~ | h    | 1    |
| ~ | 1    | 2    |
| ~ | 2    | 3    |
| ~ | 3    | t    |

h->1->2->3->t

Remove (3)

| k | prev | curr |
| ~ | h    | 1    |
| ~ | 1    | 2    |
-------------------
| ~ | 2    | t    |

h->1->2->t

Implications:
- Simpler list operations. No magic tricks (e.g. backtracking) or dead nodes.
- Must lookup (prev) node and update it prev->prev. Not atomic - issues?
- tail node will always be hot and point of contention.

Original comment by Ben.Manes@gmail.com on 27 Jan 2009 at 8:44

GoogleCodeExporter commented 9 years ago

Proposed Solution #2
--------------------
Drop the linked list and do not maintain sorted order. Make the common case 
fast 
(reads) and add expense for writes. Reads update a volatile counter, while 
writes 
perform a linear scan to determine the oldest node to evict. Split entries 
across 
buckets to allow concurrent writes and reduce contention, but does not provide 
pure 
LRU policy. Rather, LRU from the chosen sample in the bucket.

This approach is similar to ConcurrentHashMap, where writes acquire the bucket 
lock. 
A bucket contains an array of associated nodes, where each node holds the 
key/value/
counter.

Reads
------
The node is retrieved from the ConcurrentMap. If LRU, node->counter is set to 
the 
increment of a global counter (avoids calling system clock). If LFU, increment 
node-
>counter. Reads may contend on the global counter, but accuracy is unnecessary. 
No 
CAS needed - just read/increment/write. Races are fine - same timestamp!

Adds
------
Each bucket+array-index is provided in a "free list" (aka queue). This allows 
knowing which entries are free. This is shuffled when first created.

1. Add to ConcurrentMap
2. Poll for an slot
 a) if found go to the bucket; lock it; insert node at index.
 b) choose random bucket; lock it; scan for oldest; remove from array; remove from 
map; add to "free list"; goto (2).

Implications
------------
- Reads are cheap and don't update shared state
- Writes are semi-expensive. Spread across buckets, but not very efficient.
- Not pure eviction policy - is sample good enough?
- Resizing adds buckets (free list entries), which hurts the spread. Will 
coerce to 
a decent spread over time.
- Supports LRU/LFU/MRU/MFU. Allows composite policies, e.g. ARC.

Original comment by Ben.Manes@gmail.com on 27 Jan 2009 at 9:04

GoogleCodeExporter commented 9 years ago

Another idea occured to me which would be a simple change to the current 
implementation, but support eager removals safely. As usual, it mixes a bunch 
of 
ideas together. :)

Taking the dead node approach, what if we inserted one between each active 
element? 
This gives us a buffer and we can safely manipulate the forward and back 
pointers 
without impacting another active element. Thus, we don't corrupt the chain. 
This 
leaves us with 2+ dead nodes between active elements which we'd want to compact 
down 
to one. While slightly tricky, I'm pretty sure I can do that safely.

I'm going to work out the last bit of the algorithm Thursday night and, if 
possible, 
make the enhancement. If it works, it will provide an elegant solution to this 
annoying deficiency.

Original comment by Ben.Manes@gmail.com on 26 Feb 2009 at 10:09

GoogleCodeExporter commented 9 years ago

See [http://code.google.com/p/concurrentlinkedhashmap/wiki/Design design] wiki 
page.

Original comment by Ben.Manes@gmail.com on 28 Feb 2009 at 4:15

Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

I have published v2, but it is still going under testing. This provides a 
simplified 
and optimized version of the algorithm discussed previously on the wiki page.

Original comment by Ben.Manes@gmail.com on 19 May 2009 at 10:13

Changed state: Fixed

songchuanyuan66 / concurrentlinkedhashmap

Evict dead nodes first in LRU Eviction Policy #1