linvon / cuckoo-filter

Cuckoo Filter go implement, better than Bloom Filter, configurable and space optimized 布谷鸟过滤器的Go实现,优于布隆过滤器,可以定制化过滤器参数,并进行了空间优化
MIT License
294 stars 27 forks source link

Enhance Support for Larger Datasets and Buckets #10

Closed EladGabay closed 1 year ago

EladGabay commented 1 year ago

This commit improves encoding by enabling the handling of number of items and buckets exceeding max(uint32). Formerly, the encoding used uint32 for counts, but the filter structure already supported larger values using uint. Until now, the filter partially supported larger datasets, not all the buckets were utilized, note to the change in generateIndexTagHash, altIndex and indexHash.

Now, all references to bucket indices and item counts explicitly use uint64. A new encoding format accommodates larger filter. To distinguish between legacy (up to max(uint32) items) and the new format, a prefix marker is introduced.

Decoding seamlessly supports both formats. The encode method takes a legacy boolean parameter for gradual adoption.

EladGabay commented 1 year ago

@linvon it's ready to review. Thanks!