jermp / pthash

Fast and compact minimal perfect hash functions in C++.
MIT License
182 stars 26 forks source link

distinct_keys doesn't actually remove duplicates #1

Closed Kristine1975 closed 2 years ago

Kristine1975 commented 2 years ago

The following code removes duplicates, but doesn't shrink the vector to actually remove the duplicates that std::unique moved to its end:

https://github.com/jermp/pthash/blob/2607873d28c5467205b9962c89b30680f5b788b4/src/util.hpp#L92

The line should probably be:

keys.erase(std::unique(keys.begin(), keys.end()), keys.end());
jermp commented 2 years ago

Thank you @Kristine1975, you're right and indeed I usually do what you suggested as keys.resize(std::distance(keys.begin(), std::unique(keys.begin(), keys.end())));. However, the subsequent push_back will cause doubling of the vector which is better to avoid. Also, I noted that pushing consecutive values may cause overflow and actually insertion of duplicate keys. So I changed strategy. See latest commit.

Let me know if everything works for you.