Open bdupras opened 9 years ago
@bdupras thinks for ping me on this issue. I will be able to help if there is anything needed from my side. Question here, I guess for CuckooFilter
, it will implement delete
(or similar) as that is the differentiator of CuckooFilter from BloomFilter, right?
@ben-manes I think cuckoo filter shares the similar idea of TinySet in the way they both store fingerprints in a hash table for approximate set-membership tests (like BloomFilter). One key difference is TinySet is more like using traditional open-addressing hashtable, while cuckoo filter is built based on partial-key cuckoo hashing (see more in our paper Cuckoo Filter: Practically Better Than Bloom). My very biased view is, in the context of achieving really high-space efficiency and also good performance, partial-key cuckoo hashing is very easy to understand and implement correctly. :)
...
CuckooFilter
, it will implementdelete
(or similar)
Yes, absolutely. Perhaps as a separate interface, e.g. ...
public interface DeletableFilter<T> {
boolean delete(T object);
}
@apc999 I'll see if I can get our first attempt at an implementation out to a place where you can have a look. Nicely written paper, by the way - very easy to understand.
A few things of note about our implementation...
We chose a different strategy for calculating the alternate index. The paper's method using xor
has a side-effect of requiring the number of buckets (B) to be a power of 2. Our implementation calculates an offset from the current index that is always odd, and then applies that offset positive or negative depending on whether the current index is even or odd, wrapping around the ends of the bucket array as necessary. This requires only that the number of buckets be even, giving us more flexibility in the memory footprint than power of 2.
We also implemented rollback on insert such that if we exhaust the max_kicks, we rollback and leave the filter in a consistent state before throwing an insertion error.
@beala - did I forget anything?
Hi @apc999! Thanks for offering your help and thanks for your paper! It's been a big win for us.
Two other additions:
We are sympathetic to the need for an interface, but we think new types of probabilistic sets can be developed and published separately from Guava.
Thanks @kevinb9n. We'd certainly welcome an interface to implement against. Would you be interested in us providing a pull request to do that?
@kluever, we noticed in issue #1518 that you requested deletion support from the Bloom Filter. Are you still interested having a deletable filter implementation?
@bdupras I personally don't have a need for that feature (I filed the issue on behalf of a user). https://groups.google.com/forum/#!topic/guava-discuss/bwimfMksFWg
tl;dr: A Cuckoo Filter provides similar functionality to a Bloom Filter with lower hashing overhead and deletion support.
Add:
public interface ProbablisticFilter
Add:public class CuckooFilter implements ProbablisticFilter
Modify:public class BloomFilter
implements ProbablisticFilter
The Cuckoo Filter is described in this paper: Cuckoo Filter: Practically Better Than Bloom. Quoting from the abstract:
Possbile Interface:
We can contribute an implementation that follows the norms of the
BloomFilter
design if there's interest.@bdupras, @beala
/cc @binfan999 @apc999 @dave-andersen @mkaminsky & Michael Mitzenmacher (couldn't find his GitHub username)