Closed stefanfred closed 5 months ago
Hi @stefanfred,
the size on disk should match that returned by num_bits()
, i.e.,
everything that we need to evaluate the function. So you cannot remove the space for the meta data, although small.
I think you should add this space to the GPU implementation rather than removing it from the CPU implementation.
-Giulio
Hey Giulio,
ok. I will add the bucketer space in the GPU implementation. However, we still need to use the same formula for bucket count per partition like on the GPU. I can not change that on the GPU side, because it does not allow a dependence between the bucket count per partition and the number of input keys. (would require shader recompilation for each MPHF instance)
Hi @stefanfred, perfect, thanks!
A little nitpick: why the renaming from Minimal
to NeedsFreeArray
?
A function by itself does not "need" anything, rather we choose to have it minimal or not. So I think the parameter should be named Minimal
as it was before.
My thought process was that minimal does not necessarily mean that the free array is used. E.g. for alpha=1 and additive hashing. I think calling it minimal would be a bit confusing if the PHF can be minimal without it.
I have now added multithreaded encoding. Actually was a bottleneck now on the large machine :)
Aligned calculation of bucket count per partition to GPU Removed bucketer from space consumption (in my opinion we do not need to count anything towards space consumption that does not depend on the input keys themselves.)
GPU and CPU implementation now have same (max diff 0.004 bits per key) space consumption for same configuration