Closed e4lam closed 4 years ago
Well, robin_hood map doesn't have iterator stability. unordered_node_map
and unordered_node_set
have reference and pointer stability, but not iterator stability. Note std::unordered_map also doesn't have iterator stability.
I don't know your benchmark code, but maybe the compiler has optimized something completely away? This tends to happen for small benchmarks. Usually robin_hood map shouldn't be faster than an array :)
Concerning the hashing, robin_hood has implementations for all basic types that std::hash also has specializations for, and std::string. (see here: https://en.cppreference.com/w/cpp/utility/hash)
It mostly only falls back to std::hash if you have provided a user defined implementation for a special type.
Ok, right. But does both robin_hood::unordered_{map,set}
and std::unordered_{map,set}
have reference/pointer stability?
That is a good point about things being optimized away. I'll have to double-check. The benchmark is like the following, with lists of items that have varying number of unique items and duplicates.
void addIgnoringDups(size_t item)
{
bool inserted = myItemSet.insert(item).second; // eg. unordered_set<size_t>
if (!inserted)
return;
myItems.push_back(item); // this is a std::vector<size_t>
}
vs
void addIgnoringDups(size_t item)
{
for (size_t i = 0, n = myItems.size(); i < n; ++i)
{
if (myItems[i] == item)
return;
}
myItems.push_back(item); // this is a std::vector<size_t>
}
Thanks for the answers!
std::unordered_{map,set}
has reference & pointer stability, for robin_hood::unordered_{map,set}
it depends. This type is basically either robin_hood::unordered_flat_{map,set}
or robin_hood::unordered_node_{map,set}
, based on what it thinks will be faster.
So in short:
robin_hood::unordered_node_{map,set}
You might want to have a look at my nanobench project, it has a doNotOptimizeAway
to make sure a result is not optimized away by the compiler: https://github.com/martinus/nanobench
Yep, I had started using nanobench for precisely for these current set of benchmarks (which lead me to making requests like https://github.com/martinus/nanobench/issues/14).
The benchmark code I'm using currently looks like this:
SetNodeT node(items.size());
cfg.run(name,
[&]() {
for (auto item : items)
node.addIgnoringDups(item);
}).doNotOptimizeAway(node);
Ah, I see the difference now, robin_hood::unordered_node_{map,set}
chooses based on a heuristic around the size of the key type and its properties.
On reference & pointer stability, I was confused because I had first added ska::flat_hash_set
and ska::unordered_set
to my benchmark which I think doesn't map the same way to robin_hood::unordered_flat_set
/ robin_hood::unordered_set
does for this.
In my benchmarks for using robin_hood for avoiding duplicate pointers/integers, the performance seems almost magical. In a test with 8 random unique integers repeated once (for a total of 16 insertions), even
robin_hood::unordered_set<size_t>
it is about 1.2x faster than an array that uses linear search to avoid duplicates. So much so that it feels like I'm doing something wrong. Is that expected?How does
robin_hood::unordered_set
manage to be so fast at these small sizes while maintaining iterator stability? Or does it give that up?On a separate note, I'm a bit worried that the
README.md
says that it falls back tostd::hash
in other cases. But as Malte Skarupke notes, this can be bad on Windows because Visual C++'sstd::hash
uses stronger hash functions to compensate for the fact that itsunordered_set
implementation uses power of 2 sizes and then only keeps the lower bits of the hash result. In fact, not only does it use FNV1 hash for integers, but also for enum, float, double, and pointers. So it feels dangerous to fallback tostd::hash
. Is there a customization point to use sayboost::hash
as the fallback instead?