Closed Vagabond closed 3 years ago
This is a good idea. I'll look into this during the weekend.
Considering implementing this. An api like:
EmptyFilter = xor8:empty_filter(),
OneElemFilter = xor8:add(EmptyFilter, [1]]),
ThreeElemFilter = xor8:add(OneElemFilter, [2, 3]]),
Filter = xor8:finalize(ThreeElemFilter),
would make sense? Add some guarding to prevent a pre-finialized filter from working with contain / free. It's a little unweidly to require finalizing, but then we wouldn't need to keep track of the int list and the filter during its whole life cycle. Any thoughts on this?
Also, are you using this lib? I see some traffic recently and am curious on how it is working...
Yes, we're using it, although not to its full extent yet. It seems to work quite well.
I don't object to the API proposed above, it's pretty similar to how some of the compression and cryptographic libraries work when you want to use them in a streaming mode.
This has been added. Will try to get around to supporting deduplication of input soon. If dups are added then finalize
will fail.
The library now auto-dedups input for the incremental api.
For constructing very large filters, it might be nice to have a way to allocate a 'value list' resource along the lines of
Then you could load keys into it a few at a time and once you were done you could populate the xor filter and free the resource. This would avoid storing all the unhashed/hashed keys in Erlang land and allow them to only be retained in the NIF (where they're most compact). If the value list gets too big you could use realloc() to grow it.
I don't have time to do it now, just trying to capture the thought.