jermp / pthash

Fast and compact minimal perfect hash functions in C++.
MIT License
182 stars 26 forks source link

small code fix #10

Closed songqing closed 1 year ago

songqing commented 1 year ago

Small code optimization when building

songqing commented 1 year ago

` // construct a distinct std::unordered_map<uint64_t, uint64_t> distinct;

std::vector<std::pair<uint64_t, uint64_t>> vec;
vec.reserve(distinct.size());
for (auto& p : distinct) vec.emplace_back(p.first, p.second);
std::sort(vec.begin(), vec.end(),
          [](auto const& x, auto const& y) { return x.second > y.second; });
distinct.clear();
// assign codewords by non-increasing frequency
std::vector<uint64_t> dict;
dict.reserve(vec.size());
for (uint64_t i = 0; i != vec.size(); ++i) {
    auto& p = vec[i];
    distinct.insert({p.first, i});
    dict.push_back(p.first);
}`

I write a test with the code, the map 'distinct' is constructed and has 100 million kvs, the master code uses 42.6s, and my code uses 36.3s. You can have a try.

songqing commented 1 year ago

But test with src/example.cpp, for 100 million distinct keys, the size of the map 'distinct' in function compute_ranks_and_dictionary() is only less than 3000, so, this optimization is not useful.