mozilla / filter-cascade

A python filter cascade implementation
Mozilla Public License 2.0
6 stars 5 forks source link

Replaced size calculation with a more optimal version #1

Closed bowlinearl closed 5 years ago

bowlinearl commented 5 years ago

I'm not sure what the origin of the calc_size() formulation used in this implementation was, but it's a bit sub-optimal. Here are some plots showing the size of the existing "Moz" bloom filters versus the 1.44nlog_2(1/p) formulation from the CRLite paper:

results

All three plots show how filters using the two size formulas vary as the number of elements (n) and false positive rate (FPR, p) vary. The top graph is total filter sizes in bits, the middle graph is difference in size in bits, and the bottom graph is percentage increase in size.

Depending on how things work out with math.ceil(), the existing formulation produces filters that are 0-5% oversized.

I also added some additional, optional parameters to the filter cascade constructor, so downstream users can parameterize the free space overhead and minimum filter sizes.