Open zaidoon1 opened 7 months ago
Actually that policy they use is called W-TinyLFU and it's documented on this wiki as a future feature. Their implementation has a few gaps that ideally would be addressed,
Hi. Thank you for the information. I reviewed the design of TinyUFO and S3-FIFO. I also compared the performance of a TinyUFO implementation (TinyUFO@0.1.0
) against moka@0.12.6
using my mokabench
tool with the ARC-S3 trace dataset.
Here is my first thought:
moka
.Before looking at the mokabench
result, let me clarify the current development priority of moka
.
(Ordered by higher to lower priorities)
get_with
, and_compute_with
, etc.), eviction listener, iterator, cache statistics.As for 1 (hit rate), I believe TinyUFO will provide similar hit rate to W-TinyLFU with fixed size of W. High hit rate is critical for application performance as the cache miss penalty (the extra latency to get the data from a slower media) is much greater than the latency to read the data from the cache.
From the mokabench result I am going to show, the average latency per cache operation (read or write) were the followings:
moka
: Between ~700 ns and ~850 ns (nanoseconds).TinyUFO
: Between ~400 ns and ~700 ns.As for examples of cache miss penalties, here is a quote from a book: The Systems Performance: Enterprise and the Cloud, 2nd Edition by Brendan Gregg
The "latency" column shows example system latencies, and the "scaled" column shows the latencies scaled to an imaginary system in which a CPU cycle takes one full second.
As you can see, 42 minutes is almost nothing compared to the latencies of accessing HDD or Internet.
So, in general, 1 (hit rate) is much more important than 6 (small performance overhead). And by design, TinyUFO and W-TinyLFU (with fixed size of W) will provide competing hit rates.
As for 4 (memory footprint), TinyUFO can do better than W-TinyLFU as they use queues and doubly linked lists respectively. A queue can be array-based and have smaller memory footprint than a doubly linked list. However, queue is not a suitable data structure to provide efficient algorithm for entry expirations, so moka
may have to continue using doubly linked lists to support expirations. This is a trade-off between 4 and 3 (convenient features).
There is a trade-off between 3 and 6 (small performance overhead) too. So, moka
that prioritizes 3 will generally perform less well than other cache implementations that offer simpler functionality. But this will not be a problem in real world applications as 1 (hit rate) has much greater impact to application performance than 6.
OK. Having say that. Here is the mokabench result.
cargo run -F tiny-ufo --release -- -n 1
-n
) was 1 (single thread).top
command.
mokabench
process.Cache | Max Capacity | Estimated number of keys for LFU | Clients | Inserts | Reads | Hit Ratio | Memory Usage (RSS) | Duration Secs | Avg. nanosecs per operation |
---|---|---|---|---|---|---|---|---|---|
Moka Sync Cache | 100,000 | n/a | 1 | 14,694,850 | 16,407,702 | 10.439% | 572,852k | 21.595 s | 694 ns |
TinyUFO | 100,000 | 100,000 | 1 | 15,685,418 | 16,407,702 | 4.402% | 570,560k | 13.077 s | 407 ns |
TinyUFO | 100,000 | 1,000,000 | 1 | 15,074,058 | 16,407,702 | 8.128% | 633,272k | 21.706 s | 689 ns |
Moka Sync Cache | 400,000 | n/a | 1 | 9,439,274 | 16,407,702 | 42.470% | 725,132k | 21.926 s | 848 ns |
TinyUFO | 400,000 | 400,000 | 1 | 11,456,282 | 16,407,702 | 30.177% | 723,656k | 17.174 s | 616 ns |
TinyUFO | 400,000 | 4,000,000 | 1 | 10,948,476 | 16,407,702 | 33.272% | 994,580k | 19.670 s | 719 ns |
Moka Sync Cache | 800,000 | n/a | 1 | 4,868,066 | 16,407,702 | 70.331% | 927,520k | 15.246 s | 716 ns |
TinyUFO | 800,000 | 800,000 | 1 | 5,597,449 | 16,407,702 | 65.885% | 929,512k | 12.203 s | 555 ns |
TinyUFO | 800,000 | 8,000,000 | 1 | 5,597,708 | 16,407,702 | 65.884% | 1.4g | 13.530 s | 614 ns |
TinyUFO
implementation was up to twice as fast as moka
.TinyUFO
provided lower hit-miss ratio than moka
.
TinyUFO
is expected to provide about the same hit-miss ratio to moka
.TinyUFO
used about the same amount of memory to moka
.
TinyUFO
is expected to use smaller amount of memory than moka
.mokabench
's driver.
fully pre-process the benchmark commands before performing the benchmark.
It seems the current TinyUFO
implementation can be improved for better hit-miss ratio and smaller memory footprint. But even though it is improved, I am afraid that the performance gain might be small.
See https://crates.io/crates/TinyUFO. I think we would see a good perf boost from this