greg7mdp / parallel-hashmap

A family of header-only, very fast and memory-friendly hashmap and btree containers.
https://greg7mdp.github.io/parallel-hashmap/
Apache License 2.0
2.47k stars 234 forks source link

fix: add growth_left for dump and load, to fix the bug that when loa… #195

Closed LisaLuty closed 1 year ago

LisaLuty commented 1 year ago

I'm using phmap::parallel_flat_hash_map from v1.35 and encountered a hang case :

I print out ctrl_ after loading and find that there is only small portion of empty slots left but a lot of delete slots, however growth_left is still large (since it was not dumped and loaded, it was set as 7/8 of the capacity - size), it never triggers rehash_and_grow_if_necessary after reload. So finally this map runs out of empty slots and hangs in the while loop searching for empty slot.

greg7mdp commented 1 year ago

Hi @LisaLuty , thanks a lot for the PR, and congrats of finding the cause of the issue and the fix. It is good, but the only thing that bothers me is that it is incompatible with files saved with a previous version.

It should be possible to make it compatible, by using the fact that the size of the saved map is likely smaller than std::numeric_limits<size_t>::max() - 10. I have tried the following code and it seems to work:

static constexpr size_t s_version_base = std::numeric_limits<size_t>::max() - 10;
static constexpr size_t s_version = s_version_base;

// ------------------------------------------------------------------------
// dump/load for raw_hash_set
// ------------------------------------------------------------------------
template <class Policy, class Hash, class Eq, class Alloc>
template<typename OutputArchive>
bool raw_hash_set<Policy, Hash, Eq, Alloc>::phmap_dump(OutputArchive& ar) const {
    static_assert(type_traits_internal::IsTriviallyCopyable<value_type>::value,
                    "value_type should be trivially copyable");

    ar.saveBinary(&s_version, sizeof(size_t));
    ar.saveBinary(&growth_left(), sizeof(size_t));
    ar.saveBinary(&size_, sizeof(size_t));
    ar.saveBinary(&capacity_, sizeof(size_t));

    if (size_ == 0)
        return true;
    ar.saveBinary(ctrl_,  sizeof(ctrl_t) * (capacity_ + Group::kWidth + 1));
    ar.saveBinary(slots_, sizeof(slot_type) * capacity_);
    return true;
}

template <class Policy, class Hash, class Eq, class Alloc>
template<typename InputArchive>
bool raw_hash_set<Policy, Hash, Eq, Alloc>::phmap_load(InputArchive& ar) {
    static_assert(type_traits_internal::IsTriviallyCopyable<value_type>::value,
                    "value_type should be trivially copyable");
    raw_hash_set<Policy, Hash, Eq, Alloc>().swap(*this); // clear any existing content

    size_t version;

    ar.loadBinary(&version, sizeof(size_t));
    if (version < s_version_base) {
       // we didn't store the version, version actually contains the size
       size_ = version;
    } else {
       ar.loadBinary(&growth_left(), sizeof(size_t));
       ar.loadBinary(&size_, sizeof(size_t));
    }
    ar.loadBinary(&capacity_, sizeof(size_t));

    if (capacity_) {
        // allocate memory for ctrl_ and slots_
        initialize_slots(capacity_);
    }
    if (size_ == 0)
        return true;
    ar.loadBinary(ctrl_,  sizeof(ctrl_t) * (capacity_ + Group::kWidth + 1));
    ar.loadBinary(slots_, sizeof(slot_type) * capacity_);
    return true;
}

If you update your PR with this code, I'll be happy to merge it!

LisaLuty commented 1 year ago

Hi @greg7mdp, glad to get your reply! I've updated the PR according to your comment. A tiny difference: loading growth_left at the end in case that initialize_slots resets growth_left