pawandubey / cuckoo_filter

Pure Ruby Cuckoo Filter Implementation
Apache License 2.0
119 stars 9 forks source link

Serialize / deserialize a filter #2

Closed DanielHeath closed 5 years ago

DanielHeath commented 5 years ago

I'm working on shrinking the (11gb zipped) 'have I been pwned' password breach dataset into a cuckoo filter, so a webapp can efficiently check whether a new password has previously been disclosed.

It would be useful to be able to use this to 'pre-bake' a filter file.

DanielHeath commented 5 years ago

Nevermind, Marshal.dump and Marshal.load work fine.

pawandubey commented 5 years ago

@DanielHeath I'm assuming this issue can be closed. Feel free to reopen if required.

P.S: did you get your project working?

DanielHeath commented 5 years ago

Ended up writing my own pure-ruby bloom filter (~40 loc).

The array approach used in this library is much too slow for the kind of load I'm using - instead, I created a string with the 8bit-ascii encoding and treated it as a bitfield.

DanielHeath commented 5 years ago

Also, serializing an 8bit-ascii string to a binary file is (obviously) trivial and has no encoding/decoding overhead.

pawandubey commented 5 years ago

That's a neat hack! Thanks for the update 🙌