llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.06k stars 11.59k forks source link

unordered_map and unordered_set operator== double key comparison causes exponential behavior #42106

Open 5d40e055-7d34-4db9-aa7d-290d9e7b2a11 opened 5 years ago

5d40e055-7d34-4db9-aa7d-290d9e7b2a11 commented 5 years ago
Bugzilla Link 42761
Version 9.0
OS All
CC @mclow,@terrelln,@zoecarver

Extended Description

When unordered_map or unorderedset has a nested unordered{map,set} as a key, operator== becomes exponential in the depth of the nesting. This is because unordered_{map,set}::operator== does two key comparisons, one in find() using keyeq() and one when comparing the key-value pairs using the key’s operator==. These two comparisons can theoretically be different but are almost always the same in practice. unordered{map,set} should only do one key comparison in operator== to avoid this. It is also generally more efficient even in the non-nesting cases.

Nathan Bronson helped create a minimal repro below that demonstrates this exponential behavior.

include

include

include

struct Nested;

struct NestedHash { std::size_t operator()(Nested const& n) const; };

struct Nested { std::unique_ptr<std::unordered_set<Nested, NestedHash>> set;

explicit Nested(int depth) : set(std::make_unique<std::unordered_set<Nested, NestedHash>>()) { if (depth > 0) { set->emplace(depth - 1); } } };

std::size_t NestedHash::operator()(Nested const& n) const { std::size_t rv = 1; for (auto& k : *n.set) { rv += operator()(k); } return rv; }

bool operator==(Nested const& lhs, Nested const& rhs) { return lhs.set == rhs.set; }

bool operator!=(Nested const& lhs, Nested const& rhs) { return !(lhs == rhs); }

void testNestedSetEquality() { auto n1 = Nested(100); auto n2 = Nested(100); auto n3 = Nested(99); assert(n1 == n1); assert(n1 == n2); assert(n1 != n3); }

This is a contrived example, but this shows up in practice in folly::dynamic::operator== [1] as a worst case exponential algorithm. No one will create objects like this intentionally, but they can be produced with folly::parseJson() [2] when non-string keys are allowed. If an attacker controlled the input to parseJson() they could easily DDOS the machine with a payload like "{{{{{0:0}:0}:0}:0}:0}" but nested 100 layers deep instead of 5.

unordered_map and unordered_set should only do one key comparison per item in operator==. This is possible [3] because it is undefined behavior to call the containers operator== if the key’s operator== isn’t a refinement of key_eq(). Fixing this bug in unordered_map and unordered_set avoids the exponential behavior with nested keys, and it is generally more efficient even in the non-nesting cases. An implementation of set and map equality with only one key comparison is included below, thanks to Nathan Bronson.

template bool eqSet(S const& lhs, S const& rhs) { if (lhs.size() != rhs.size()) { return false; } for (auto& v : lhs) { auto slot = rhs.bucket(v); auto b = rhs.begin(slot); auto e = rhs.end(slot); while (b != e && !(*b == v)) { ++b; } if (b == e) { return false; } } return true; }

template bool eqMap(M const& lhs, M const& rhs) { if (lhs.size() != rhs.size()) { return false; } for (auto& v : lhs) { auto slot = rhs.bucket(v.first); auto b = rhs.begin(slot); auto e = rhs.end(slot); while (b != e && !(b->first == v.first)) { ++b; } if (b == e || !(b->second == v.second)) { return false; } } return true; }

[1] https://github.com/facebook/folly/blob/master/folly/dynamic.cpp#L113 [2] https://github.com/facebook/folly/blob/master/folly/json.h#L194 [3] https://github.com/facebook/folly/commit/11855c21b21fb46fe1004de8c0412dd94eb5c45f

zoecarver commented 5 years ago

Yes, you are right. My bad. Disregard my comment.

5d40e055-7d34-4db9-aa7d-290d9e7b2a11 commented 5 years ago

Using bucket would be more performant, but it fails in cases like the following:

unordered_set<int> a = { 1, 1, 2 };
unordered_set<int> b = { 1, 2, 2 };
assert(eqSet(a, b) != true);

These two sets seem equal to me and both algorithms report equal, do you mean unordered_multiset?

You're right that this algorithm won't work with multiset. I only propose that we change unordered_{map,set}::operator==.

zoecarver commented 5 years ago

Using bucket would be more performant, but it fails in cases like the following:

unordered_set<int> a = { 1, 1, 2 };
unordered_set<int> b = { 1, 2, 2 };
assert(eqSet(a, b) != true);