jermp / pthash

Fast and compact minimal perfect hash functions in C++.
MIT License
186 stars 25 forks source link

the program fall in loop #67

Closed Zhao00Xu closed 1 month ago

Zhao00Xu commented 1 month ago

Hello! the function "search_sequential",when step to "continue; // in-bucket collision detected, try next pilot"(search.hpp line 155),it fell in loop. There are many groups of keys need to build hash,when it happended, I record the group size, and push a std::numeric_limits::max(), then it will be OK. But this is not a friendly method, bacause I need to modify the code and update the program and run it again. So, is there a better way to solve the promblem, or maybe I do something wrong when using pthash.

jermp commented 1 month ago

Please, provide an example. What are the parameters you're using? thanks!

Zhao00Xu commented 1 month ago

std::vector keys = {658,1183,459,382,488,623,506,596,0,100,101,102,104,105,109,111,112,377,114,117,466,487,339,376}; //assert(keys.size() == num_keys);

/ Set up a build configuration. / build_configuration config; config.c = 7.0; config.alpha = 0.99; config.seed = 727369; config.minimal_output = true; // mphf config.verbose_output = false;

jermp commented 1 month ago

Hi, this is happening due to hash correlations induced by XOR. This only happens in pathological cases for small key sets. If you change seed or c, it will work.

Alternatively, you can use the additive_displacement search function from the phobic branch. See the example here: https://github.com/jermp/pthash/blob/phobic/src/example.cpp.

Best, -Giulio