nosferalatu / SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques
The Unlicense
382 stars 41 forks source link

64 bit Murmur3 hash #2

Closed isratnisa closed 3 years ago

isratnisa commented 4 years ago

Do you have a 64 bit Murmur3 hash function handy? I want to extend the code to support 64 bit keys.

nosferalatu commented 4 years ago

See the fmix64() function from the Murmur3 source code: https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp

Also, to extend the SimpleGPUHashTable to 64 bit keys, I believe you will to use 8-byte alignment for the 64-bit key. If you use a 64 bit key and 32 bit value like struct KeyValue { uint64_t key; uint32_t value; }; then the C compiler will add four bytes of padding. You have a few choices: 1) Do nothing (just be aware that the padding bytes are wasting four bytes per KeyValue) 2) Use a 64 bit value 3) Separate the 64 bit keys into one array, and the 32 bit values into a separate array

Option #3 guarantees that you'll have at least two cache misses per operation (one cache miss for the key lookup, and one cache miss for the value lookup). On the other hand, if your probe lengths are long, this might have better performance anyway (because you would first linearly search through tightly packed keys, then do a single lookup of the value). But for short probe lengths, I would expect that option #1 or #2 would have better performance. All of that said, GPUs do very well at hiding memory fetch latency if you give them enough work, so I would have to profile these different options to see what works best.

I'm curious what you find.

isratnisa commented 4 years ago

Thanks for the clear detailed reply! Didn't know about the padding effect from C compiler (thanks for that). Going with #option 3 for now. Will let you know when I have some analysis of real data.