Closed martinus closed 1 year ago
I have a good implemention of spare hash map https://github.com/ktprime/emhash/blob/master/hash_table8.hpp it's the fastest iteration compared with other's hash map, but it's not efficient for erasion/insertion if Key/Value is large. it need 8 bytes index for each key/value pair.
bench_IterateIntegers map = emhash5 total iterate/removing time = 5.44, 11.98|1127848008
bench_IterateIntegers map = emhash8 total iterate/removing time = 0.39, 1.17
bench_IterateIntegers map = emhash7 total iterate/removing time = 1.00, 2.78
bench_IterateIntegers map = emhash6 total iterate/removing time = 1.08, 2.92
bench_IterateIntegers map = rigtorp total iterate/removing time = 5.38, 13.14
bench_IterateIntegers map = martinus flat total iterate/removing time = 3.02, 9.06
bench_IterateIntegers map = phmap flat total iterate/removing time = 3.38, 8.23
bench_IterateIntegers map = emilib3 total iterate/removing time = 1.69, 4.48
bench_IterateIntegers map = emilib2 total iterate/removing time = 1.62, 4.34
bench_IterateIntegers map = absl flat total iterate/removing time = 3.27, 8.28
Performance is important but not everything
Use an
std::vector<std::pair<Key, Value>>
(maybe customizable) for the content that is always 100% compact. iteration is perfectly fast.Use an indexing structure into that vector, that cannot overflow, e.g. like so:
=> limitations are: no more than 16M collisions (so no overflow check necessary?)
4 byte index, so no more than 4 billion elements are allowed. Should be enough for most use cases.
mumx
to 32bit would workidx = ((uint64_t)hash * (uint64_t)mapsize) >> 32
1 byte hash: only every 256th key needs to be checked, that's good enough
ordering like robin-hood-hash, we we only need a single
<
comparisonhow to deal with capacity() changes / rehash? (resize to capacity + 1, then resize to the new capacity(). This sucks.)