google / guava

Google core libraries for Java
Apache License 2.0
50.2k stars 10.91k forks source link

Consider adding asBigInteger() to HashCode #1323

Open gissuebot opened 10 years ago

gissuebot commented 10 years ago

Original issue created by lowasser@google.com on 2013-03-07 at 06:00 PM


I've encountered a few users internally who are e.g. converting a hash to a BigInteger and then modding it or something.

gissuebot commented 10 years ago

Original comment posted by kurt.kluever on 2013-03-07 at 06:04 PM


Which hashing algorithms are they using? Presumably ones with >64 bits, right?

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2013-03-07 at 06:06 PM


How often are they modding the results compared to doing something else? I've wondered before whether we'd benefit from a HashCode.bucket(int buckets) method. And then I wonder whether those users should use WeightedConsistentHash (which, incidentally, we should probably release).

gissuebot commented 10 years ago

Original comment posted by kevinb@google.com on 2013-03-07 at 11:56 PM


I wonder how much real world benefit these users actually get from expensively modding a 128-bit BigInteger over more cheaply modding the asInt(). I can only imagine that being bad if the modulus happens to be in the vicinity of 2^33/3, 2^33/5, 2^33/7 or maybe a few more like that.


Status: Research

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2013-03-08 at 04:28 PM


My assumption was that the callers were concerned about negative numbers (<http://code.google.com/p/error-prone/issues/detail?id=77>), but I haven't even tried to confirm or deny that assumption.

kluever commented 9 years ago

Related: http://stackoverflow.com/questions/29932956/murmur3-hash-different-result-between-python-and-java-implementation

cgdecker commented 9 years ago

If other languages (python in the above example) give a numeric output for a hash like murmur3 128, it seems useful if nothing else to have a method on HashCode that does the same. This is particularly true, I believe, because it's somewhat hard to create a BigInteger with the same value correctly.

The author of the above issue tries taking the hex output of toString() and passing it to new BigInteger(hex, 16). This doesn't work for two reasons:

  1. The pairs of hex digits in the output actually represent the bytes of the hash code in the same order they're returned by asBytes() (which needs to be interpreted as little-endian when converting to a number). This means you need to create a new String by reversing the pairs of characters in the toString() output before passing them to the BigInteger constructor.
  2. Perhaps more importantly, the BigInteger constructor expects a - in front of the String if it is a negative number, rather than interpreting the hex as a sequence of bytes that should be interpreted as a two's complement binary number.

The correct way to create a BigInteger is to take the byte[] output of asBytes(), reverse its bytes (little-endian to big-endian) and then pass it to new BigInteger(byte[]), which does interpret the bytes as the two's complement binary representation of the number.