Open leoisl opened 1 year ago
(fwiw John Lees and Johanna in the office next to us has been using robin hood hashing and got Dan into it too)
Might be worth trying Google's SparseHash (https://github.com/sparsehash/sparsehash). Lower memory usage but slower, could have some use cases (e.g. roundhound)
Andreas has more experience than us on these data structures, and he recommends using this map: https://gitlab.ub.uni-bielefeld.de/gi/sans/-/blob/kc/src/tsl/sparse_map.h (best compromise for both speed and RAM). We might have hash maps that use even less RAM with a cost of being (much) slower, which is sth that we might consider for tools like RH
Currently our main indexing structure, which maps kmer hashes to their PRGs and locations, is a
std::unordered_map
. There are better alternatives, likeboost::unordered_flat_map
(a review here: https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html?m=1) androbin hood hashing
(unsure about this one, but nice to document, https://github.com/martinus/unordered_dense). There might be even more suited data structures making use of the fact that after built our index is immutable