vla / BloomFilter.NetCore

A bloom filter implementation
MIT License
144 stars 38 forks source link

Need support for larger than 2 * MaxInt bits #12

Closed GUZZ07 closed 3 weeks ago

GUZZ07 commented 2 months ago

Hello! I am using BloomFilter.NetCore for some text deduplication, an ArgumentOutOfRangeException was threw when I was trying to use a filter with expectedElements = 800000000.

Unhandled exception. System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'index')
Actual value was 466135261.
   at System.Collections.BitArray.ThrowArgumentOutOfRangeException(Int32 index)
   at BloomFilter.FilterMemory.Get(Int64 index)
   at BloomFilter.FilterMemory.Contains(ReadOnlySpan`1 element)
   at BloomFilter.BloomFilterExtensions.Contains(IBloomFilter bloomFilter, String data)

Then I tried to download this repo and rewrite some parts in FilterMemory.cs, I replace _hashBits1 and _hashBits2 with a BitArray[] _hashBits as well as their associated methods. It works and my previous program runs as expected. Can you make your library supports larger expectedElements?

vla commented 2 months ago
  1. When using memory, it is recommended to split multiple instances.
  2. Use redis, multiple bitmap

(follow-up)Attempt to allocate multiple BitArrays based on the expected element length.