linvon / cuckoo-filter

Cuckoo Filter go implement, better than Bloom Filter, configurable and space optimized 布谷鸟过滤器的Go实现,优于布隆过滤器,可以定制化过滤器参数,并进行了空间优化
MIT License
294 stars 27 forks source link

Introduce DecodeFrom and EncodeReader #3

Closed EladGabay closed 2 years ago

EladGabay commented 2 years ago

The buckets byte slice is the biggest part of the the memory used by the filter, and might be several of GBs.

Common usage of a filter is in an environment with limited RAM size based on the filter size, load it to memory on startup and dump it to disk on teardown. Currently the Encode and Decode methods duplicates the byte slice, which makes the memory usage at the loading and dumping time to be (at least) twice the filter size.

This commit introduces a new method for dumping the filter using a reader of the internal byte slice, and a method for loading the filter based on already fetched encoded bytes (from disk, network) and use them internaly instead of making a copy.

linvon commented 2 years ago

Nice idea by the way

EladGabay commented 2 years ago

Nice idea by the way

Would you like to merge it :)?

linvon commented 2 years ago

Nice idea by the way

Would you like to merge it :)?

I think metaDataSize should be remove out of SizeInBytes, can you fix it?

EladGabay commented 2 years ago

SizeInBytes should reflect the size of encoded filter (metadata + data), this way the user can prepare the required memory for encoding\decoding, and this is the actual size in bytes used by the filter. In addition, it's necessary to be the exact number of bytes for creating the bytes slice in Encode before ReadFull, otherwise we'll need to do ReadAll and pay with re-allocations and copies.

Added a new commit that makes it aligned in the filter object.

linvon commented 2 years ago

i'd like to merge the Reader part, we can discuss the Size part in the future, can you split this into two MR?

EladGabay commented 2 years ago

I suggest to keep the SizeInBytes without the metadata part and introduce EncodedSizeInBytes method that returns the size including the metadata. Sounds good?

linvon commented 2 years ago

EncodedSizeInBytes

this is okay too

EladGabay commented 2 years ago

Now reader returns also the size, so we can prepare the memory in advance.

EladGabay commented 2 years ago

Would you like to create a new tag?

linvon commented 2 years ago

Would you like to create a new tag?

sure