Closed GoogleCodeExporter closed 9 years ago
Sorry for the vague remark. I need to rewrite the design page and only have a
rough draft
(http://code.google.com/p/concurrentlinkedhashmap/wiki/Design_draft). The older
design docs which attempted a lock-free style are no longer relevant.
The core idea is that the LRU order can be maintained asynchronously with the
hashtable so a read/write does not block on an LRU reorder operation. Instead,
these operations are queued to defer execution and periodically applied. This
means that concurrent reads/writes can occur and schedule their work, while
another thread may acquire a lock to perform pending operations. This allows
non-blocking reads and amortizes the penalty across threads.
Any operation that requires traversing the LRU directly would block as it is
performed under a dedicated lock. This is currently only #clear() as its rare
enough that queuing all the removals is wasteful compared to a short delay of
waiting for a drain to complete.
The ordered traversal methods would also need to block and would be a snapshot
of the hotness order. This is because a concurrent LRU traversal would behave
incorrectly (e.g. node is reordered while iterating so repeated elements). This
would have the negative effect of,
1) Concurrent calls to traversal / clear methods would be blocking (but rare!).
2) An O(n) copy of the LRU chain would be required for traversal methods, which
might be too expensive for a large cache (10M?) like Cassandra.
If the penalty of (2) is not acceptable, then instead of a traditional external
iterator an internal iterator could be used instead. That might be useful if
you knew that you only wanted a subset, e.g. top 10%. This could either by a
predicate function to determine whether to continue consuming.
What would be the ideal API for Cassandra? An ordered snapshot traversal method
would be the most natural in Java, but I'm concerned that the large caches may
make that unacceptable.
Original comment by Ben.Manes@gmail.com
on 12 Jan 2011 at 7:02
For Cassandra's purposes, it's reasonable to assume that we have room on the
heap for a shallow copy of the cache contents. Even more so if we only need to
copy keys rather than values as well.
Original comment by jbel...@gmail.com
on 12 Jan 2011 at 7:28
The memory overhead wasn't my concern, since it would just be a pointer copy.
The length of the the traversal of the LRU (linked list) would be large, e.g.
O(10m). That would be a lot of pointer chasing and might take a while to
traverse.
I should probably benchmark to get a rough idea of how long it takes to loop
through a LinkedList at various sizes (e.g. 1k, 100k, 1m, 10m). I could also be
misunderstanding what I recall of Cassandra having large caches, e.g. is it the
memory footprint and/or the number of entries? If the number of entries isn't
insanely large then I'm probably making too big a deal about the traversal cost.
Original comment by Ben.Manes@gmail.com
on 12 Jan 2011 at 8:27
As long as it's only blocking clear or other traversals, that should be fine --
we do have caches of millions of rows, but we only need to traverse-in-hotness
order to save out the cache for reloading later, so "single digits of minutes"
is the most frequent that makes sense to do that.
Original comment by jbel...@gmail.com
on 12 Jan 2011 at 8:48
Great, based on these requirements this should be trivial.
Original comment by Ben.Manes@gmail.com
on 12 Jan 2011 at 10:59
We have another use case: https://issues.apache.org/jira/browse/CASSANDRA-2175
Running into this pointed out that we might not have room on the heap for a
full shallow copy. If we could say "give me a shallow copy of the N warmest
entries" that would be ideal.
Original comment by jbel...@datastax.com
on 16 Feb 2011 at 6:12
OK, I'll get back to CLHM this weekend and start on the pending items. Sorry
that its been on the back burner lately.
Original comment by bma...@google.com
on 16 Feb 2011 at 7:25
Now provides ascending and descending snapshot views of the keySet and map,
with an optional limit to the number of captured. The ascending/descending
order is based on the eligibility of retention. The method names try to best
honor the convention in NavigatableMap.
"give me a shallow copy of the N warmest entries"
Map<K, V> snapshot = map.descendingMapWithLimit(N);
Original comment by Ben.Manes@gmail.com
on 5 Mar 2011 at 9:13
Fantastic! Thanks, Ben.
Original comment by jbel...@gmail.com
on 7 Mar 2011 at 5:01
Sorry for the delay. I'm ready to release this week unless there's any final
feedback.
Please consider performing a canary test. Added optional task:
https://issues.apache.org/jira/browse/CASSANDRA-2661
Original comment by Ben.Manes@gmail.com
on 17 May 2011 at 8:33
Original issue reported on code.google.com by
jbel...@gmail.com
on 12 Jan 2011 at 6:32