Open jason7708 opened 1 month ago
thanks @jason7708 , you are absolutely correct and thanks for the clz fix. Either fix would be very appreciated, these computations are only at compile time so as long as it wont blow up compilation times any solution qoukd be great, thanks.
When I tried certain inputs and did not enable -mbmi2, my mask was calculated as
163839
. The compilation crashed because the nbits calculation produced an extremely large number. And I found that the issue was caused by an incorrectclz
calculaiton.I fixed the error and submitted a pull request.
In addition, I found that the pext algorithm here is not correct. The article from the link you provided mentions that when the gaps between the bits we are interested in are too small, multiplying by a coefficient can cause addition that leads to a carry, which makes the result incorrect.
Assuming the mask is 21 and a is also 21, the coefficient will become (4 + 2 + 1) = 7 but the result will be 4 while it should be 7.
You can easily generate the mask as 21 using the following input and the lookup result will be incorrect.
I also checked the Intel repository link you provided, and it has the same issue with
pseudo_pext_t
. However, their perfect hash lookup does not produce errors because they usepseudo_pext_t
to ensure that hash values are unique when selecting masks. In contrast, MPH only checks thekey & mask
. I also posted an issue there, but I'm not sure if this was an intentional design choice.Since I’m unsure how you would like the modification to be made, if the goal is to ensure mask selection like Intel's approach, the behavior of the pext function still appears inconsistent and somewhat strange. Ensuring adequate gaps between mask bits would increase the computation, which doesn’t seem like a very practical solution. Therefore, for now, I have only fixed the clz function.