jeromefroe / lru-rs

An implementation of a LRU cache
https://docs.rs/lru/
MIT License
627 stars 104 forks source link

Consider replacing HashMap with HashSet #119

Open matklad opened 2 years ago

matklad commented 2 years ago

Today, the core data structure is a hash-map of LruEntries keyed by key refs

struct LruEntry<K, V> {
    key: mem::MaybeUninit<K>,
    val: mem::MaybeUninit<V>,
    prev: *mut LruEntry<K, V>,
    next: *mut LruEntry<K, V>,
}

struct Lru {
  map: HashMap<KeyRef<K>, Box<LruEntry<K, V>>, S>,
}

Essentially, the data strored here has this shape: HashMap<&K, (K, V)>. That is, it's a hash map which stores objects, where the key is the part of the object itself. THe current implementation is less efficient than it could have been, because we effectively store the K twice (once as a part of (K, V), and once as a &K). A more efficient impl would be HashSet<Entry<K, V>>, where entry is defined as follows:

struct Entry<K, V> { key: K, value: V}
impl<K, V> Borrow<K> for Entry<K, V> {}

Here's a playground that shows the pattern: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=bc60814a14a353784d3ba8c89c2cc109

matklad commented 2 years ago

Spend some time looking into this. This is harder than it seems for somewhat stupid reasons.

First thing I've run into is that I need something like HashSet:get_mut(k: &K) -> Option<&mut K> to fudge the pointers. Sadly, std only exposes & version (so that it's impossible to break hash map by modifying the value). This can be worked-around with Cell for next/prev pointers, but unfortunately we also need &mut V for the value for the case where we update existing entry.

The &mut API we need is available via unstable raw_entry feature, as well as via hash brown.

So I wanted to just use hashbrown (ie, make it non optional), but, alas, that was not trivial, due to different DefaultHasher business. Basically, I think what we want here is this pattern. We'd also have to fix iterator definitions in this crate, as today they only are defined for the default value of S.

On the positive side, switching to hasbrown would allow us to correct wrong KeyRef<K>: Borrow<Q> bounds into correct K: Borrow<Q> ones.

Aaaand, while working on it, I've noticed that the code I think isn't fully kosher with respect to stacked borrows. Specifically, bits like this:

            let old_key = KeyRef {
                k: unsafe { &(*(*(*self.tail).prev).key.as_ptr()) },
            };
            let mut old_node = self.map.remove(&old_key).unwrap();

Here, map.remove has &mut over the whole map (so, presumably, over all entries as well). But in Eq for KeyRef, we also create & for a single key, which I am not sure is fully correct.