prataprc / xorfilter

Rust library implementing xor-filters
Apache License 2.0
137 stars 18 forks source link

Serialize / Deserialize Xor8 type. #1

Open prataprc opened 4 years ago

prataprc commented 4 years ago

So that it can be persisted onto disk and retrieved later for membership checks.

Update1: Now serializing and de-serializing Xor8::build_hasher() is more challenging. For instance documentation from std has this to say:

If RandomState is used as BuildHasher, std has got this to say

A particular instance RandomState will create the same instances of Hasher, but the hashers created by two different RandomState instances are unlikely to produce the same result for the same values.

If DefaultHasher is used as BuildHasher, std has got this to say,

The internal algorithm is not specified, and so its hashes should not be relied upon over releases.

So unless we have a stable BuildHasher type that is stable across releases and across instances, we may not be able to provide a stable serialization and de-serialization API.

uijin commented 4 years ago

Hi, @prataprc,

I would like to write the persistent(SerDes) function :)

prataprc commented 4 years ago

There are many types of serialization formats. IMHO SerDe wants to Serialize any Rust type to any of those serialization formats.

In this case, I think, we only need binary serialization. So to begin with we can implement a simple encode() decode() API and do SerDe at a later point ?

And thanks for the offer.

prataprc commented 4 years ago

https://lemire.me/blog/2019/12/19/xor-filters-faster-and-smaller-than-bloom-filters/ ^ blog post five some idea about serializing the filter.

ayazhafiz commented 4 years ago

FWIW, I have another impl of the xor filters in Rust with optional serialization/deserialization with serde behind a feature flag: https://github.com/ayazhafiz/xorf.

Feel free to use that implementation, or we can even merge these two libraries. Let me know what you think.

uijin commented 4 years ago

@ayazhafiz, Just like @prataprc says, currently we need binary serialization only. I would develop a simple file persistent function firstly. Your impl of SerDe is worth for reference.

prataprc commented 4 years ago

Feel free to use that implementation, or we can even merge these two libraries. Let me know what you think.

@ayazhafiz thanks for the offer, will give a shout-out when the need arises. Cheers,

uijin commented 4 years ago

For new filter data structure, I would add an upgraded version of the persistent function, which could save new attributes(keys and hash_builder).

prataprc commented 4 years ago

IMHO, in case of Xor8, Serialization / De-serialization is only applicable to bitmap-index and its associated fields. That is, we only need those fields required to execute the "contain()" API.

I have tried to scope the problem of handing really large set of keys in #9.

uijin commented 4 years ago

@prataprc Thanks for the explanation, I agree with you.