Closed GoogleCodeExporter closed 9 years ago
You may be too kind in regards to believing that it was "by design". Originally
I
had intended to use direct removals rather than lazy ones, and had implemented
a
backtracking algorithm to do this. This passed my single-threaded unit tests,
but I
wasn't comfortable enough adopting it in production and removed it before
validating
under multiple threads. The simpler alternative was adopted to reduce
complexity and
have a solid base. The last version using backtracking is attached.
I have been meaning to get back and fix this, among other minor enhancements,
but
was a little too burned out from other activities. While LRU isn't the best
algorithm under concurrency, and I prefer the Second Chance policy, it is the
standard that everyone expects. So I agree it needs to be supported. I was
considering making the linkage strategy a setting, because I would expect that
my
doubly-linked list would be slower than Doug Lea's queue (which is acceptable
for
FIFO policies). You can see some thrashing on the write benchmarks, likely
needing a
bit of a busy wait to back off rather than pound the system bus.
In regards to your idea - moving a node requires removal. Once its removed,
there is
no reason to add it back. However, removal can be tricky and the head/tail
manipulations are greatly simplified due using sentinal nodes. In addition to
the
backtracking strategy, I had worked out the details for a 3-node locking
approach
which seemed fairly ugly, but doable. If you want to venture down either of
these
paths, or have another idea entirely, feel free to give it a shot!
Let me know if you want any help (code/brainstorming/etc), and commit access is
available if desired. Feel free to email me directly, continue on the issue, or
group.
Original comment by Ben.Manes@gmail.com
on 23 Jan 2009 at 9:27
Attachments:
link to discussion thread
http://groups.google.com/group/concurrentlinkedhashmap/browse_thread/thread/1b65
01f0dede6ff6
Original comment by christia...@gmail.com
on 24 Jan 2009 at 10:53
Proposed Solution #1
---------------------
Drop double-links and use single-link list. Store as the map's value {prev,
curr}
nodes. This implies that to remove from the list, the prev->next reference can
be
updated to curr->next. No back-pointers are required.
The map's corresponding entry where the curr node is the value's prev must be
updated. This can be done via a hash lookup.
| k | prev | curr |
| ~ | h | 1 |
| ~ | 1 | 2 |
| ~ | 2 | 3 |
| ~ | 3 | t |
h->1->2->3->t
Remove (3)
| k | prev | curr |
| ~ | h | 1 |
| ~ | 1 | 2 |
-------------------
| ~ | 2 | t |
h->1->2->t
Implications:
- Simpler list operations. No magic tricks (e.g. backtracking) or dead nodes.
- Must lookup (prev) node and update it prev->prev. Not atomic - issues?
- tail node will always be hot and point of contention.
Original comment by Ben.Manes@gmail.com
on 27 Jan 2009 at 8:44
Proposed Solution #2
--------------------
Drop the linked list and do not maintain sorted order. Make the common case
fast
(reads) and add expense for writes. Reads update a volatile counter, while
writes
perform a linear scan to determine the oldest node to evict. Split entries
across
buckets to allow concurrent writes and reduce contention, but does not provide
pure
LRU policy. Rather, LRU from the chosen sample in the bucket.
This approach is similar to ConcurrentHashMap, where writes acquire the bucket
lock.
A bucket contains an array of associated nodes, where each node holds the
key/value/
counter.
Reads
------
The node is retrieved from the ConcurrentMap. If LRU, node->counter is set to
the
increment of a global counter (avoids calling system clock). If LFU, increment
node-
>counter. Reads may contend on the global counter, but accuracy is unnecessary.
No
CAS needed - just read/increment/write. Races are fine - same timestamp!
Adds
------
Each bucket+array-index is provided in a "free list" (aka queue). This allows
knowing which entries are free. This is shuffled when first created.
1. Add to ConcurrentMap
2. Poll for an slot
a) if found go to the bucket; lock it; insert node at index.
b) choose random bucket; lock it; scan for oldest; remove from array; remove from
map; add to "free list"; goto (2).
Implications
------------
- Reads are cheap and don't update shared state
- Writes are semi-expensive. Spread across buckets, but not very efficient.
- Not pure eviction policy - is sample good enough?
- Resizing adds buckets (free list entries), which hurts the spread. Will
coerce to
a decent spread over time.
- Supports LRU/LFU/MRU/MFU. Allows composite policies, e.g. ARC.
Original comment by Ben.Manes@gmail.com
on 27 Jan 2009 at 9:04
Another idea occured to me which would be a simple change to the current
implementation, but support eager removals safely. As usual, it mixes a bunch
of
ideas together. :)
Taking the dead node approach, what if we inserted one between each active
element?
This gives us a buffer and we can safely manipulate the forward and back
pointers
without impacting another active element. Thus, we don't corrupt the chain.
This
leaves us with 2+ dead nodes between active elements which we'd want to compact
down
to one. While slightly tricky, I'm pretty sure I can do that safely.
I'm going to work out the last bit of the algorithm Thursday night and, if
possible,
make the enhancement. If it works, it will provide an elegant solution to this
annoying deficiency.
Original comment by Ben.Manes@gmail.com
on 26 Feb 2009 at 10:09
See [http://code.google.com/p/concurrentlinkedhashmap/wiki/Design design] wiki
page.
Original comment by Ben.Manes@gmail.com
on 28 Feb 2009 at 4:15
I have published v2, but it is still going under testing. This provides a
simplified
and optimized version of the algorithm discussed previously on the wiki page.
Original comment by Ben.Manes@gmail.com
on 19 May 2009 at 10:13
Original issue reported on code.google.com by
christia...@gmail.com
on 22 Jan 2009 at 1:46