Closed sh-mochi closed 9 months ago
can't you use something like this, maybe in a helper function? I'm a bit wary with adding new functionality that's possible with other means
if (auto it = map.find(key); it != map.end()) {
val = std::move(it->second);
map.erase(it);
}
Not possible by other means sadly. Sometimes you need to get not just the associated value (mapped type) but also the key. This is problematic for both a map and a set structure because moving the key changes the value that still resides in the container and leaves it (the value) in an unspecified (but usable) state. The std::unordered_map/set containers are then left in an invalid state. Thus the key type is neatly guarded by constness.
if (auto it = set.find(key); it != set.end()) {
val = std::move(const_cast<SomeType &>(*it)); // HACK
set.erase(it);
}
So you can copy the key, but that might involve dynamic allocation (e.g. strings/buffers), or worse: the key is not copy-able and so forever 'stuck' in that container. Your implementation would allow keys to be 'moved out' of map/set containers - there is just no API for that. In my view that is a useful feature worth considering.
Here is what I had in mind.
template <class Q = T, std::enable_if_t<!is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(const const_iterator i) -> value_type {
value_type tmp{std::move(const_cast<value_type&>(*i))};
erase(i);
return tmp;
}
template <class K, class Q = T, std::enable_if_t<!is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(K const& key) -> std::optional<value_type> {
auto i = find(key);
if(i == end()) return {};
std::optional<value_type> result(std::in_place, std::move(const_cast<value_type&>(*i)));
erase(i);
return result;
}
template <class Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(const const_iterator i) -> value_type {
value_type tmp{
std::move(const_cast<Key&>(i->first)),
std::move(const_cast<T&>(i->second))};
erase(i);
return tmp;
}
template <class K, class Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
[[nodiscard]] auto eject(K const& key) -> std::optional<value_type> {
auto i = find(key);
if(i == end()) return {};
std::optional<value_type> result(std::in_place,
std::move(const_cast<Key&>(i->first)),
std::move(i->second));
erase(i);
return result;
}
The problem case:
ankerl::unordered_dense::set<std::string> m;
m.emplace("this string is way too long for SSO!");
auto i = m.find("this string is way too long for SSO!");
// silently invokes copy constructor
// also allocs memory during clean up
auto cp_obj = std::move(*i);
assert(!i->empty());
// moves, but yikes
auto mv_obj = std::move(const_cast<std::string&>(*i));
assert(i->empty());
m.erase(i);
With new API:
ankerl::unordered_dense::set<std::string> m;
m.emplace("this string is way too long for SSO!");
auto i = m.find("this string is way too long for SSO!");
auto mv_obj = m.eject(i); // string is moved, no alloc -> reliable cleanup
assert(mv_obj == "this string is way too long for SSO!");
Hi again, I see your point. Currently I have an extract()
method that extracts the whole underlying std::vector. I'd say the API should be the same as for erase()
, except that it doesn't return an iterator:
auto extract(const_iterator i) -> value_type;
auto extract(Key const& key) -> std::optional<value_type>;
template <class K>
auto extract(Key&& key) -> std::optional<value_type>;
It might also be a good idea to add an API that can throw an exception if one doesn't like the optional (similar to at
):
// throws std::out_of_range when key is not found
auto extract_at(Key const& key) -> value_type;
// throws std::out_of_range when key is not found
template <class K>
auto extract_at(Key&& key,) -> value_type;
Implementation is a bit trickier though, as it's not allowed to call erase(it) when the object has already been moved out.
"..not allowed to call erase(it) when the object has already been moved out." Ugh, even though I looked at the code I totally didn't catch that the hash value has to be re-computed. I thought about using your hash map/set as a replacement for something I had cooked up in the past and finally found some time to do so. The memory usage is about 20% down (massive) and speed almost doubled. Had to replace usage of std::pair with std::tuple and bolt those extract() methods on because that is what my library needs. All I did was copy do_erase(bucket_idx)
as do_extract(bucket_idx)
where I grab the value right before m_values.pop_back()
. On top of that I added the extract
methods as you suggested based on erase(it). Here are the relevant (and hopefully correct) changes I made.
auto do_extract(value_idx_type bucket_idx) {
auto const value_idx_to_remove = at(m_buckets, bucket_idx).m_value_idx;
// shift down until either empty or an element with correct spot is found
auto next_bucket_idx = next(bucket_idx);
while (at(m_buckets, next_bucket_idx).m_dist_and_fingerprint >= Bucket::dist_inc * 2) {
at(m_buckets, bucket_idx) = {dist_dec(at(m_buckets, next_bucket_idx).m_dist_and_fingerprint),
at(m_buckets, next_bucket_idx).m_value_idx};
bucket_idx = std::exchange(next_bucket_idx, next(next_bucket_idx));
}
at(m_buckets, bucket_idx) = {};
// update m_values
if (value_idx_to_remove != m_values.size() - 1) {
// no luck, we'll have to replace the value with the last one and update the index accordingly
auto& val = m_values[value_idx_to_remove];
val = std::move(m_values.back());
// update the values_idx of the moved entry. No need to play the info game, just look until we find the values_idx
auto mh = mixed_hash(get_key(val));
bucket_idx = bucket_idx_from_hash(mh);
auto const values_idx_back = static_cast<value_idx_type>(m_values.size() - 1);
while (values_idx_back != at(m_buckets, bucket_idx).m_value_idx) {
bucket_idx = next(bucket_idx);
}
at(m_buckets, bucket_idx).m_value_idx = value_idx_to_remove;
}
value_type tmp = std::move(m_values.back());
m_values.pop_back();
return tmp;
}
auto extract(iterator it) -> value_type {
auto hash = mixed_hash(get_key(*it));
auto bucket_idx = bucket_idx_from_hash(hash);
auto const value_idx_to_remove = static_cast<value_idx_type>(it - cbegin());
while (at(m_buckets, bucket_idx).m_value_idx != value_idx_to_remove) {
bucket_idx = next(bucket_idx);
}
return do_extract(bucket_idx);
}
template <typename Q = T, std::enable_if_t<is_map_v<Q>, bool> = true>
auto extract(const_iterator it) -> value_type {
return extract(begin() + (it - cbegin())); // maybe a `it.mut() -> iterator` function would be useful
}
template <class K>
auto extract(K const& key) -> std::optional<value_type> {
auto i = find(key);
if(i == end()) return {};
return extract(i);
}
Thanks for the code, but you are moving out the wrong element :)
I have a version up in PR #105 that should work correctly, can you give it a try? https://raw.githubusercontent.com/martinus/unordered_dense/633023a0abf9ffa44092e4c2510fa8a7f6a3cb21/include/ankerl/unordered_dense.h
During xmas I always test hash maps with just single entry and it worked like a charm ;)
Today I break that rule. Merged the changes and actually did some testing regarding map<string,int>::extract() + 10 minutes fuzzing with random string keys. All good, thank you very much!
Happy Holidays!
Thanks for testing, happy Holidays! Fixed in #105
An 'erase(key)' method that returns the previously stored instance by means of move construction would be really handy. Moving elements out of a container is tricky when all you get is a const ref. Please consider the following two methods for your table<> template. I am not 100% sure but I guess something like that should work just fine?