jasondavies / bloomfilter.js

JavaScript bloom filter using FNV for fast hashing
http://www.jasondavies.com/bloomfilter/
BSD 3-Clause "New" or "Revised" License
759 stars 79 forks source link

Use optimal m/k given n/p option #1

Open Glench opened 12 years ago

Glench commented 12 years ago

I was just wondering why you don't have an option to use the optimal m and k parameters (see here) based on n and p.

There are a couple ways I imagine this going:

I could implement any of these, but I wanted to ask you first if there was any reason why you didn't do this.

jasondavies commented 12 years ago

Great idea! I originally didn't include n and p as I was inspired by https://github.com/willf/bloom, which includes an empirical false positive estimator rather than relying on theory. I'm not entirely sure why he does this, but it's probably because FNK isn't as good as a theoretically perfect hash function (although it is very fast and hence "good enough" for bloom filters).

I plan on making the hash function configurable anyway, so I'll add support for this at the same time.

The other idea I had was the ability to set a fixed false positive rate, and have the bit array automatically grow (perhaps by doubling each time) to keep the rate below a fixed value.

Glench commented 12 years ago

Ah, I see. Excellent!

Apparently this guy implemented the latter idea you mentioned. I haven't looked at his implementation, but maybe you can take inspiration/code from there :)

Phyks commented 9 years ago

I implemented such an approach on a fork of mine : https://github.com/Phyks/bloomfilter.js

This uses capacity and error_rate instead of m and k as in the current implementation.

@jasondavies It might not be well fitted for a direct PR, and I'm not sure about my code, but it does the job AFAIK, and passes my tests successfully. Thanks for writing the original code !