Open 5d40e055-7d34-4db9-aa7d-290d9e7b2a11 opened 5 years ago
Yes, you are right. My bad. Disregard my comment.
Using bucket would be more performant, but it fails in cases like the following:
unordered_set<int> a = { 1, 1, 2 }; unordered_set<int> b = { 1, 2, 2 }; assert(eqSet(a, b) != true);
These two sets seem equal to me and both algorithms report equal, do you mean unordered_multiset?
You're right that this algorithm won't work with multiset. I only propose that we change unordered_{map,set}::operator==.
Using bucket would be more performant, but it fails in cases like the following:
unordered_set<int> a = { 1, 1, 2 };
unordered_set<int> b = { 1, 2, 2 };
assert(eqSet(a, b) != true);
Extended Description
When unordered_map or unorderedset has a nested unordered{map,set} as a key, operator== becomes exponential in the depth of the nesting. This is because unordered_{map,set}::operator== does two key comparisons, one in find() using keyeq() and one when comparing the key-value pairs using the key’s operator==. These two comparisons can theoretically be different but are almost always the same in practice. unordered{map,set} should only do one key comparison in operator== to avoid this. It is also generally more efficient even in the non-nesting cases.
Nathan Bronson helped create a minimal repro below that demonstrates this exponential behavior.
include
include
include
struct Nested;
struct NestedHash { std::size_t operator()(Nested const& n) const; };
struct Nested { std::unique_ptr<std::unordered_set<Nested, NestedHash>> set;
explicit Nested(int depth) : set(std::make_unique<std::unordered_set<Nested, NestedHash>>()) { if (depth > 0) { set->emplace(depth - 1); } } };
std::size_t NestedHash::operator()(Nested const& n) const { std::size_t rv = 1; for (auto& k : *n.set) { rv += operator()(k); } return rv; }
bool operator==(Nested const& lhs, Nested const& rhs) { return lhs.set == rhs.set; }
bool operator!=(Nested const& lhs, Nested const& rhs) { return !(lhs == rhs); }
void testNestedSetEquality() { auto n1 = Nested(100); auto n2 = Nested(100); auto n3 = Nested(99); assert(n1 == n1); assert(n1 == n2); assert(n1 != n3); }
This is a contrived example, but this shows up in practice in folly::dynamic::operator== [1] as a worst case exponential algorithm. No one will create objects like this intentionally, but they can be produced with folly::parseJson() [2] when non-string keys are allowed. If an attacker controlled the input to parseJson() they could easily DDOS the machine with a payload like "{{{{{0:0}:0}:0}:0}:0}" but nested 100 layers deep instead of 5.
unordered_map and unordered_set should only do one key comparison per item in operator==. This is possible [3] because it is undefined behavior to call the containers operator== if the key’s operator== isn’t a refinement of key_eq(). Fixing this bug in unordered_map and unordered_set avoids the exponential behavior with nested keys, and it is generally more efficient even in the non-nesting cases. An implementation of set and map equality with only one key comparison is included below, thanks to Nathan Bronson.
template
bool eqSet(S const& lhs, S const& rhs) {
if (lhs.size() != rhs.size()) {
return false;
}
for (auto& v : lhs) {
auto slot = rhs.bucket(v);
auto b = rhs.begin(slot);
auto e = rhs.end(slot);
while (b != e && !(*b == v)) {
++b;
}
if (b == e) {
return false;
}
}
return true;
}
template
bool eqMap(M const& lhs, M const& rhs) {
if (lhs.size() != rhs.size()) {
return false;
}
for (auto& v : lhs) {
auto slot = rhs.bucket(v.first);
auto b = rhs.begin(slot);
auto e = rhs.end(slot);
while (b != e && !(b->first == v.first)) {
++b;
}
if (b == e || !(b->second == v.second)) {
return false;
}
}
return true;
}
[1] https://github.com/facebook/folly/blob/master/folly/dynamic.cpp#L113 [2] https://github.com/facebook/folly/blob/master/folly/json.h#L194 [3] https://github.com/facebook/folly/commit/11855c21b21fb46fe1004de8c0412dd94eb5c45f