jaybaird / python-bloomfilter

Scalable Bloom Filter implemented in Python
MIT License
1.62k stars 330 forks source link

ratio in ScalableBloomFilter #21

Open ScottShao opened 8 years ago

ScottShao commented 8 years ago

Hi, I'm wondering why you are using ratio in ScalableBloomFilter, and it seems that the first filter has different error_as from the rest filters. Because in the code, the first filter has error rate as error_rate * (1 - ratio), and the rest of filters have error rate as error_rate * ratio.

johnyf commented 8 years ago

It would be helpful if you provided links to specific lines in a specific commit of branch master.

ScottShao commented 8 years ago

https://github.com/jaybaird/python-bloomfilter/blob/master/pybloom/pybloom.py

in line 365 and line 372, when you add a new filter into scalable filters, it seems you are using different error rate, for the first filter, the error rate is error_rate * (1 - ratio), while it's error_rate * ratio for the rest of filters.

johnyf commented 8 years ago

Thanks for the additional information. When master branch changes, the above link will point to different lines. The reference can be made persistent with: https://github.com/jaybaird/python-bloomfilter/blob/70e25c653ab87fbc2273328e89544d4124f52065/pybloom/pybloom.py#L365 and: https://github.com/jaybaird/python-bloomfilter/blob/70e25c653ab87fbc2273328e89544d4124f52065/pybloom/pybloom.py#L372 that can also be written as line 365 and line 372.

Please note that I am not the author of pybloom.

ScottShao commented 8 years ago

Thank you for the tips.

joseph-fox commented 8 years ago

@ScottShao I think this version https://pypi.python.org/pypi/pybloom_live/2.1.0 addresses your concerns.

ScottShao commented 8 years ago

@joseph-fox Thank you for sharing the information. It seems that the only place that we use ratio is when we create a new filter, and we use ratio * error_rate as the error rate of new filter. If this is the case, I don't think we need the parameter ratio any more.