richardstartin / richardstartin.github.io

12 stars 4 forks source link

posts/building-a-bloom-filter-from-scratch #23

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Building a Bloom Filter from Scratch | Richard Startin’s Blog

The Bloom filter is an interesting data structure for modelling approximate sets of objects. It can tell you with certainty that an object is not in a collection, but may give false positives. If you write Java for a living, I would not suggest you implement your own, because there is a perfectly good implementation in Guava. It can be illuminating to write one from scratch, however.

https://richardstartin.github.io/posts/building-a-bloom-filter-from-scratch

isopropylcyanide commented 3 years ago

Nicely explained, Richard.

bilarsen commented 3 years ago

Great article! Much appreciated! However I can't come to terms with array[hash >>> 6]. Why do we need it? I mean, we already have hash & (size - 1) when we are calculating hash of a given argument which ensures that the index in the array will always be less than its size. So why do we then divide that index by 64? Don't we just lose space in the array like that and increase the probability of a collision?

richardstartin commented 3 years ago

@cypherman oh, it seems unnecessary because of a mistake in the example code. The size field is supposed to be the number of bits in the array, hence the logical right shift, but in the code sample it’s the actual size of the array, which is the mistake.

bilarsen commented 3 years ago

@richardstartin, thanks for such a quick response! In that case what should be the actual size of the array? And I still don't follow the logic of a right shift if we have already normalized hash in mapHash(int hash)...

richardstartin commented 3 years ago

@cypherman the array itself should be 64x smaller than the number of bits you want to be able to store (size / 64). I've updated the post, thanks for pointing out the problem.

bilarsen commented 3 years ago

@richardstartin I see now. Once again thank you!