mapbox / variant

C++11/C++14 Variant
BSD 3-Clause "New" or "Revised" License
371 stars 101 forks source link

Hashable should take type index into account #131

Open daniel-j-h opened 7 years ago

daniel-j-h commented 7 years ago

Hashing a variant should take the type index / .which() into account in addition to the underlying hash

Why? Because of the edge case where the underlying hash is the same but the type index is not.

tomilov commented 7 years ago

Why is it the problem? There are hash collisions anyways. After matching the hashes of two values any algorithm which uses hashes should to compare for equality. Variants with different active alternatives are not equal. Runtime overhead imposed by hash-combining of the value's hash and of the the (excessive) hash of its index is permanent otherwise. It is hardly desirable.

tomilov commented 7 years ago

I sure the probability of hash(A) == hash(B) is the same as probability of combine(hash(A), hash(index(A))) == combine(hash(B), hash(index(B))). Say decltype(A) == int and decltype(B) == char, hash is identity for integral types and hash combiner is xor. Then for variant< A, B >'s comparison variant{A{2}} vs variant{B{1}}: 2 ^ 1 == 1 ^ 2. And it is the truth for all the combinations of successive even and odd underlying values. As you can see the probability is equally the same (w/o loss of generality for hashes with avalanche effect and non-linear combiners).

daniel-j-h commented 7 years ago

I open this ticket mainly because I read the updates on

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0088r3.html

hash<variant> can now return different values than hash (and it should - presumably it should take the index() into account).

and

http://en.cppreference.com/w/cpp/utility/variant/hash

Unlike std::hash, hash of a variant does not typically equal the hash of the contained value; this makes it possible to distinguish std::variant<int, int> holding the same value as different alternatives.

the Boost.Variant impl. does the same, too.

tomilov commented 7 years ago

Anyways hash combining every time for any alternative type is permanent runtime overhead. But std::variant< T, T > is hardly useful. Why there should be different hash values for first and second occurence of a T in the alternative type list? What is the rationale behind this?