mkodekar / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

optimalNumOfHashFunctions() May not Return what you Expect #1781

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Was experimenting with JavaScript code when I noticed this.  If you call

    optimalNumOfHashFunctions(319, 3072)

It returns 6. A quick JavaScript version returned 7. The reason is because the 
division acts only on integers and returns an integer. In JavaScript everything 
is a float. That is, in Java, `3072 / 319` returns 9, but in JavaScript it 
returns 9.63. Maybe it was intentional to have an implicitly truncated value, 
but I'm guessing not. The fix would be something like:

--- a/guava/src/com/google/common/hash/BloomFilter.java
+++ b/guava/src/com/google/common/hash/BloomFilter.java
@@ -363,7 +363,7 @@ public final class BloomFilter<T> implements Predicate<T>, 
Serializable {
    */
   @VisibleForTesting
   static int optimalNumOfHashFunctions(long n, long m) {
-    return Math.max(1, (int) Math.round(m / n * Math.log(2)));
+    return Math.max(1, (int) Math.round((float)m / n * Math.log(2)));
   }

   /**

Original issue reported on code.google.com by justathe...@gmail.com on 13 Jun 2014 at 8:25

GoogleCodeExporter commented 9 years ago
Hey Dimitri,

Mind taking a look at this since you originally wrote that code?

Thanks,
-kak

Original comment by kak@google.com on 13 Jun 2014 at 8:30

GoogleCodeExporter commented 9 years ago
Sure. This looks valid; the current code is not a faithful rendering of the 
intended formula. An extra pair of parens around "n * Math.log(2)" should do. 
This can be fixed, since the number of hash functions is part of the serial 
form, so a different function for the create() method can't affect existing BFs.

The impact of this should be fairly low - at worst, the BF is using one less 
hash function than it would (5 instead of 6), and this doesn't affect the 
default fpp either.

Original comment by andr...@google.com on 16 Jun 2014 at 10:00

GoogleCodeExporter commented 9 years ago
@andreou: I don't think the extra pair of parens will solve it. Right now we're 
doing:
m / n * Math.log(2) // this is really (m / n) * Math.log(2)
Your proposal changes the formula:
m / (n * Math.log(2))

Original comment by kak@google.com on 16 Jun 2014 at 3:05

GoogleCodeExporter commented 9 years ago
Oops, yeah. That was too fast. :)

Original comment by andr...@google.com on 16 Jun 2014 at 3:07

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:08

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:17

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:07