Open HenryHRich opened 2 years ago
(re 'fit in memory': if you have enough cores, you can hide the memory latency and probably come close to saturating memory b/w; but if you have to go out to disc, you run into problems because a cache line is pretty close in size to a hash bucket--and I use larger hash buckets on multicores for this reason, also including the data itself or its hash (depending on size)--but a disc sector is 0.5k or 4k, so you don't make effective use of available b/w due to limited spatial locality.)
@HenryHRich btw :)
(I think if doing multiple lookups you're almost certainly right that sorting will be faster; but for a one-off search, I think hashing beats it as long as everything fits in memory, as the cost of the sort itself will dominate.)
(Also: can hash in parallel too; not just search.)