silentbicycle / theft

property-based testing for C: generate input to find obscure bugs, then reduce to minimal failing input
ISC License
611 stars 31 forks source link

Make bloom filter resize automatically #10

Closed silentbicycle closed 7 years ago

silentbicycle commented 7 years ago

The current bloom filter setup works, but is probably over-allocated by default, yet can still get too full when there is a very wide and/or deep shrinking trie. It would be better to dynamically grow the bloom filter.

The dynamic blocked bloom filter described in Tim Kaler's Cache Efficient Bloom Filters for Shared Memory Machines seems like it would address these issues -- the details of the bloom filter are contained by the theft_bloom module, so it wouldn't need an API change. A bloom filter match before saving a new entry could trigger growing, if the block is sufficiently full.

If the automatic resizing works well, theft_bloom_recommendation could always return 0 (which means to use the default), and be removed in the next API breaking version.