martinus / unordered_dense

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
MIT License
898 stars 72 forks source link

Moving elements out of a hash set #96

Closed sh-mochi closed 9 months ago

sh-mochi commented 10 months ago

An 'erase(key)' method that returns the previously stored instance by means of move construction would be really handy. Moving elements out of a container is tricky when all you get is a const ref. Please consider the following two methods for your table<> template. I am not 100% sure but I guess something like that should work just fine?

// Erase element referenced by 'i' and return a new object move constructed from the erased element.
auto eject(const const_iterator i) -> value_type {
    value_type tmp{std::move(const_cast<value_type&>(*i))};
    erase(i);
    return tmp;
}

// Lookup 'key' and if present return 'eject(find(key))', otherwise return an empty optional.
template <class K>
auto eject(K const& key) -> std::optional<value_type> {
    auto i = find(key);
    if(i == end()) return {};
    std::optional<value_type> result(std::in_place, std::move(const_cast<value_type&>(*i)));
    erase(i);
    return result;
}
martinus commented 9 months ago

can't you use something like this, maybe in a helper function? I'm a bit wary with adding new functionality that's possible with other means

if (auto it = map.find(key); it != map.end()) {
    val = std::move(it->second);
    map.erase(it);
}
sh-mochi commented 9 months ago

Not possible by other means sadly. Sometimes you need to get not just the associated value (mapped type) but also the key. This is problematic for both a map and a set structure because moving the key changes the value that still resides in the container and leaves it (the value) in an unspecified (but usable) state. The std::unordered_map/set containers are then left in an invalid state. Thus the key type is neatly guarded by constness.

if (auto it = set.find(key); it != set.end()) {
    val = std::move(const_cast<SomeType &>(*it)); // HACK
    set.erase(it);
}

So you can copy the key, but that might involve dynamic allocation (e.g. strings/buffers), or worse: the key is not copy-able and so forever 'stuck' in that container. Your implementation would allow keys to be 'moved out' of map/set containers - there is just no API for that. In my view that is a useful feature worth considering.

sh-mochi commented 9 months ago

Here is what I had in mind.

template <class Q = T, std::enable_if_t<!is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(const const_iterator i) -> value_type {
    value_type tmp{std::move(const_cast<value_type&>(*i))};
    erase(i);
    return tmp;
}

template <class K, class Q = T, std::enable_if_t<!is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(K const& key) -> std::optional<value_type> {
    auto i = find(key);
    if(i == end()) return {};
    std::optional<value_type> result(std::in_place, std::move(const_cast<value_type&>(*i)));
    erase(i);
    return result;
}

template <class Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(const const_iterator i) -> value_type {
    value_type tmp{
        std::move(const_cast<Key&>(i->first)),
        std::move(const_cast<T&>(i->second))};
    erase(i);
    return tmp;
}

template <class K, class Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(K const& key) -> std::optional<value_type> {
    auto i = find(key);
    if(i == end()) return {};
    std::optional<value_type> result(std::in_place,
        std::move(const_cast<Key&>(i->first)),
        std::move(i->second));
    erase(i);
    return result;
}

The problem case:

ankerl::unordered_dense::set<std::string> m;
m.emplace("this string is way too long for SSO!");
auto i = m.find("this string is way too long for SSO!");
// silently invokes copy constructor
// also allocs memory during clean up
auto cp_obj = std::move(*i);
assert(!i->empty());
// moves, but yikes
auto mv_obj = std::move(const_cast<std::string&>(*i));
assert(i->empty());
m.erase(i);

With new API:

ankerl::unordered_dense::set<std::string> m;
m.emplace("this string is way too long for SSO!");
auto i = m.find("this string is way too long for SSO!");
auto mv_obj = m.eject(i); // string is moved, no alloc -> reliable cleanup
assert(mv_obj == "this string is way too long for SSO!");
martinus commented 9 months ago

Hi again, I see your point. Currently I have an extract() method that extracts the whole underlying std::vector. I'd say the API should be the same as for erase(), except that it doesn't return an iterator:

auto extract(const_iterator i) -> value_type;

auto extract(Key const& key) -> std::optional<value_type>;

template <class K>
auto extract(Key&& key) -> std::optional<value_type>;

It might also be a good idea to add an API that can throw an exception if one doesn't like the optional (similar to at):

// throws std::out_of_range when key is not found
auto extract_at(Key const& key) -> value_type;

// throws std::out_of_range when key is not found
template <class K>
auto extract_at(Key&& key,) -> value_type;

Implementation is a bit trickier though, as it's not allowed to call erase(it) when the object has already been moved out.

sh-mochi commented 9 months ago

"..not allowed to call erase(it) when the object has already been moved out." Ugh, even though I looked at the code I totally didn't catch that the hash value has to be re-computed. I thought about using your hash map/set as a replacement for something I had cooked up in the past and finally found some time to do so. The memory usage is about 20% down (massive) and speed almost doubled. Had to replace usage of std::pair with std::tuple and bolt those extract() methods on because that is what my library needs. All I did was copy do_erase(bucket_idx) as do_extract(bucket_idx) where I grab the value right before m_values.pop_back(). On top of that I added the extract methods as you suggested based on erase(it). Here are the relevant (and hopefully correct) changes I made.

auto do_extract(value_idx_type bucket_idx) {
    auto const value_idx_to_remove = at(m_buckets, bucket_idx).m_value_idx;

    // shift down until either empty or an element with correct spot is found
    auto next_bucket_idx = next(bucket_idx);
    while (at(m_buckets, next_bucket_idx).m_dist_and_fingerprint >= Bucket::dist_inc * 2) {
        at(m_buckets, bucket_idx) = {dist_dec(at(m_buckets, next_bucket_idx).m_dist_and_fingerprint),
                                     at(m_buckets, next_bucket_idx).m_value_idx};
        bucket_idx = std::exchange(next_bucket_idx, next(next_bucket_idx));
    }
    at(m_buckets, bucket_idx) = {};

    // update m_values
    if (value_idx_to_remove != m_values.size() - 1) {
        // no luck, we'll have to replace the value with the last one and update the index accordingly
        auto& val = m_values[value_idx_to_remove];
        val = std::move(m_values.back());

        // update the values_idx of the moved entry. No need to play the info game, just look until we find the values_idx
        auto mh = mixed_hash(get_key(val));
        bucket_idx = bucket_idx_from_hash(mh);

        auto const values_idx_back = static_cast<value_idx_type>(m_values.size() - 1);
        while (values_idx_back != at(m_buckets, bucket_idx).m_value_idx) {
            bucket_idx = next(bucket_idx);
        }
        at(m_buckets, bucket_idx).m_value_idx = value_idx_to_remove;
    }
    value_type tmp = std::move(m_values.back());
    m_values.pop_back();
    return tmp;
}

auto extract(iterator it) -> value_type {
    auto hash = mixed_hash(get_key(*it));
    auto bucket_idx = bucket_idx_from_hash(hash);

    auto const value_idx_to_remove = static_cast<value_idx_type>(it - cbegin());
    while (at(m_buckets, bucket_idx).m_value_idx != value_idx_to_remove) {
        bucket_idx = next(bucket_idx);
    }

    return do_extract(bucket_idx);
}

template <typename Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
auto extract(const_iterator it) -> value_type {
    return extract(begin() + (it - cbegin())); // maybe a `it.mut() -> iterator` function would be useful
}

template <class K>
auto extract(K const& key) -> std::optional<value_type> {
    auto i = find(key);
    if(i == end()) return {};
    return extract(i);
}
martinus commented 9 months ago

Thanks for the code, but you are moving out the wrong element :)

I have a version up in PR #105 that should work correctly, can you give it a try? https://raw.githubusercontent.com/martinus/unordered_dense/633023a0abf9ffa44092e4c2510fa8a7f6a3cb21/include/ankerl/unordered_dense.h

sh-mochi commented 9 months ago

During xmas I always test hash maps with just single entry and it worked like a charm ;)

Today I break that rule. Merged the changes and actually did some testing regarding map<string,int>::extract() + 10 minutes fuzzing with random string keys. All good, thank you very much!

Happy Holidays!

martinus commented 9 months ago

Thanks for testing, happy Holidays! Fixed in #105