rzel / concurrentlinkedhashmap

Automatically exported from code.google.com/p/concurrentlinkedhashmap
Apache License 2.0
0 stars 0 forks source link

Bloom filter example has mistakes in setAt: div and mod by Long.SIZE instead #35

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Take example class and instantiate a BloomFilter<String>(1000,1.0F/1000)
2.add more than about 13 strings to it.
3.spot other adds fail.  print out final state using toString

What is the expected output? What do you see instead?
All bits will show as 1, not many, 64 of them (no surprise)

What version of the product are you using? On what operating system?

Please provide any additional information below.
Need to fix setAt as follows:
    private boolean setAt(int index, long[] words) { 
        //int i = index/bits; 
        //int bitIndex = index % bits; 
        int i = index/Long.SIZE; // bits/length = Long.SIZE, see contructor, above, so use that here.
        int bitIndex = index % Long.SIZE;  // ditto.....

Commented out bad old lines.

Should now work.
  - kieron

Original issue reported on code.google.com by kieron.d...@gmail.com on 25 Jul 2012 at 4:23

GoogleCodeExporter commented 9 years ago
Thanks! Obviously the class was for fun and shouldn't be used in production. ;)

Please consider Guava's Bloomfilter if you need a production-quality 
implementation. Amusingly this toy example 
[https://groups.google.com/forum/?fromgroups#!searchin/guava-discuss/double$20ha
shing/guava-discuss/h08j7hG206k/DbztIwekTwQJ unblocked] the Guava team who were 
trying to use Random/SecureRandom (unaware of double hashing). I provided 
advise and code reviews for that implementation and is what I would use in 
practice.

Original comment by Ben.Manes@gmail.com on 25 Jul 2012 at 6:26

GoogleCodeExporter commented 9 years ago
No problem. I was actually more interested in the double hashing and just
noticed the error as I passed by. In practice I have my own stuff in
non-Java languages that I use more: C#, F#, Python and C/C++, but it was
nice to see a realistic example. I tend to use Golomb/Rice coded sets with
additional sub-domain indices for much of what I do, but the trade-offs are
different for different people.

Thanks for putting this out there. I'm glad others have profited from it
too.

   - kieron

Kieron Drake

Chief Technology Officer

Mercuria Energy Trading SA

Rue du Rhone 50

1204 Geneva

Switzerland

Phone:     +41 22 595 88 25

Mobile:    +41 79 401 00 50

Original comment by kieron.d...@gmail.com on 25 Jul 2012 at 6:43