Closed songqing closed 1 year ago
` // construct a distinct std::unordered_map<uint64_t, uint64_t> distinct;
std::vector<std::pair<uint64_t, uint64_t>> vec;
vec.reserve(distinct.size());
for (auto& p : distinct) vec.emplace_back(p.first, p.second);
std::sort(vec.begin(), vec.end(),
[](auto const& x, auto const& y) { return x.second > y.second; });
distinct.clear();
// assign codewords by non-increasing frequency
std::vector<uint64_t> dict;
dict.reserve(vec.size());
for (uint64_t i = 0; i != vec.size(); ++i) {
auto& p = vec[i];
distinct.insert({p.first, i});
dict.push_back(p.first);
}`
I write a test with the code, the map 'distinct' is constructed and has 100 million kvs, the master code uses 42.6s, and my code uses 36.3s. You can have a try.
But test with src/example.cpp, for 100 million distinct keys, the size of the map 'distinct' in function compute_ranks_and_dictionary() is only less than 3000, so, this optimization is not useful.
Small code optimization when building