Closed railsmechanic closed 10 years ago
Your "size" setting should be a function of acceptable error rate: http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/
In effect, you're trading off amount memory vs error rate, and that applies regardless of the backend you're using (redis, native, etc).
If I remember rightly, the size setting is required for bloomfilters based on fixed size data structures, like an array. (Rule of thumb: If you need a bloomfilter with n elements, just set size to a multiple of n.)
But I wonder why Redis backed bloomfilters like CountingRedis need this setting? As far as I can see, the size setting is just used to calculate the modulo of the crc32 hash.
Is this modulo instruction just used to shorten the hash? (For a large size, the benefit might be negligible.) Should this statement not be removed, in order to prevent persisted bloomfilters from being cluttered, when using a different setting for size?