Closed craigfe closed 3 years ago
It is possible that trim is a slightly costly operation, in which case it might make sense to be a bit flexible on the bound (ie, only trim every delta additions, say). (Don't take this comment too seriously.)
Lru stats recording fix, plus some more tests, here: https://github.com/tomjridge/index/tree/lru
Thanks @tomjridge. I've pushed both changes & refactored the tests a bit but kept the logic mostly as-is (just ocamlformat
, splitting up test cases, using more Alcotest assertions etc.)
LGTM!
Sorry for being late to the party ;)
I have been using Lru.M earlier, and backed out of that (see https://github.com/mirage/mirage-tcpip/pull/423 https://github.com/mirage/ocaml-dns/pull/256) -- while the operations are faster (since it uses a Hashtbl), the issue is that at create
a Hsahtbl with the given capacity is allocated (see Hashtbl.create) -- which in turn allocates an Array of the given capacity.
Just in case you observe or are worried about memory consumption - since you wrap the LRU opam package, you may want to redefine your let create cap = let lru = LRU.create 7 in LRU.resize cap lru
(i.e. initially pass a smaller number to LRU.create and then call resize
which does not lead to any function call on Hashtable).
Thanks @hannesm for the heads up. We'll keep an eye on start-up memory usage and see if this is a problem. My first thought was that it's unlikely to matter since the writer is likely to completely saturate the LRU when filling the log between merges, but this isn't true of RO instances (and it's possible that there will be many of those, potentially for relatively small stores), so this indeed may be an issue.
As it happens, we plan to move away from the lru
package towards cachecache
relatively soon (as the implementation is a bit more efficient), but this currently has the same limitation w.r.t. initial size as lru
so your point is still one to keep in mind.
Fix https://github.com/mirage/index/issues/359.
Following the recent change to the log-file implementation, successful reads of the index must always read from disk (because even log file entries are not kept entirely in memory). This traded some speed performance in favour of much lower memory usage.
For workflows with temporal locality, it's possible to recover some of this speed by adding a relatively small LRU to cache
find
s from disk. This is implemented by this commit.TODO:
find
s go via the LRU).