Closed GoogleCodeExporter closed 9 years ago
We've been puzzling over how to represent hash functions in a generalized way,
and this is a very interesting idea for that. I think it would take the form
of a separate HashFunction interface, but perhaps Equivalence should connect to
that... we'll consider.
Original comment by kevinb@google.com
on 28 Sep 2010 at 1:37
Original comment by fry@google.com
on 26 Jan 2011 at 10:26
Original comment by kevinb@google.com
on 27 Jan 2011 at 6:16
[deleted comment]
Original comment by kevinb@google.com
on 13 Jul 2011 at 6:19
Original comment by kevinb@google.com
on 18 Jul 2011 at 3:53
Here's an API sketch.
HashCode {
int bits(); // asBytes().length * 8
byte[] asBytes();
int asInt(); // first 4 bytes little-endian
long asLong(); // similarly. throws if < 8 bytes
}
HashFunction {
int bits();
Hasher newHasher();
// shortcuts; not sure which exact set we want
HashCode hashBytes(byte[]);
HashCode hashChars(CharSequence);
HashCode hashLong(long);
}
Hashing {
HashFunction murmur3_32();
HashFunction murmur3_32(int seed);
HashFunction murmur3_128();
HashFunction murmur3_128(int seed);
// we choose a hash function for you. arbitrarily wide.
HashFunction goodFastHash(int minimumBits);
HashFunction md5(); // adapters of MessageDigest
HashFunction sha1();
HashFunction crc32(); // adapters of Checksum
}
Hasher {
Hasher putInt(int);
Hasher putLong(long);
// etc. etc.
HashCode hash();
}
Ideally, I would like to change Equivalence.hash() to return HashCode, allowing
it to take part in all this fun.
Original comment by kevinb@google.com
on 1 Aug 2011 at 9:09
On Hasher I left out
putBytes(byte[])
putBytes(byte[], int, int)
putChars(CharSequence) // hashes bytes directly, no encoding
putChars(CharSequence, Charset) // unsure if we need this
Incidentally, the multibyte methods like putInt() are specified to be
equivalent to calling putBytes() with the little-endian representation of that
value.
(Endianness shouldn't matter to most users; for those it does it will likely be
for compatibility with native code in which case LE is more commonly desired;
and otherwise you can simply use Ints.toByteArray() and so forth to get the BE
representation.)
Original comment by kevinb@google.com
on 1 Aug 2011 at 9:12
[deleted comment]
Kevin left out this:
interface Funnel<T> {
void funnel(T, Sink);
}
For a List<T>, and a Funnel<T>, you would iterate the elements and pass them to
the funnel, or if T is a fixed type, you could just extract the relevant parts
directly.
Please note that this only works for lists. It's just broken for Set. So when
you have a collection, the best (/the only) option is really to just extract
the collection.hashCode() value.
Original comment by andr...@google.com
on 13 Oct 2011 at 7:34
It isn't exactly that it's "broken for Set"; it's simply that *you* have to
ensure that the collection is in a deterministic order before you feed the data
in. So it's fine for SortedSet or for a LHS that you always populate in a
specific way, while it's broken for some Lists that people "use like"
Sets/Multisets.
Original comment by kevinb@google.com
on 16 Oct 2011 at 3:31
[deleted comment]
Right, I always forget SortedSet. About LHS, hmm, not that confident with that.
Its order doesn't matter for its equals() or its hashCode(), but it would
matter for HashCode, which may lead to surprizes.
I think a safe rule to advertize for when one can hash collections like that,
is "whenever equals()/hashCode() of the collection depend on the iterator's
ordering of the elements". With this, if two collections are equal, they will
_always_ have the same HashCode, which is nice.
Original comment by jim.andreou
on 16 Oct 2011 at 6:23
I don't know; I think we define the problem away by not *having* methods that
accept a Collection. You're responsible for feeding things in, you're
responsible for what order you feed them in.
Original comment by kevinb@google.com
on 16 Oct 2011 at 8:38
Agreed, not our problem per se, since we are explicit in that "ordering
matters". All I'm talking about is a simple rule for guidance of users who do
face this question (the other approach is simply collection.hashCode() and
using that). Not the sort of thing that goes in a javadoc; more like the answer
to a future stackoverflow question.
Original comment by jim.andreou
on 17 Oct 2011 at 2:50
I'm marking this issue as completed since the release of common.hash.
Original comment by wasserman.louis
on 20 Jan 2012 at 4:47
This issue has been migrated to GitHub.
It can be found at https://github.com/google/guava/issues/<id>
Original comment by cgdecker@google.com
on 1 Nov 2014 at 4:15
Original comment by cgdecker@google.com
on 3 Nov 2014 at 9:09
Original issue reported on code.google.com by
fin...@gmail.com
on 27 Sep 2010 at 1:36