seiflotfy / cuckoofilter

Cuckoo Filter: Practically Better Than Bloom
MIT License
1.11k stars 107 forks source link

Element in cuckoo filter may lose when inserting after the filter is full #42

Open wtysos11 opened 3 years ago

wtysos11 commented 3 years ago

The Insert method in cuckoofilter behave unexpected when the filter is full. In general, insert method should return true when it can store the element, and return false when it can't store. But when the cuckoo filter is full, insert method will store the element, and withdraw a random one inside filter while returning false.

I make a simple test about this:

func TestExhausted(t *testing.T){
    ckflter := seiflotfy_filter.NewFilter(10000)
    var cached [maxNumForBenchmark]bool
    elementList := make([]int,0)
    var lastElement int

    isFinish := false
    for !isFinish{
        randNum := rand.Intn(maxNumForBenchmark)
        for cached[randNum]{
            randNum = rand.Intn(maxNumForBenchmark)
        }

        finish := ckflter.Insert([]byte(strconv.Itoa(randNum)))
        if !finish{
            t.Logf("Last element is %v",randNum)
            lastElement = randNum
            isFinish = true
            break
        }else{
            cached[randNum] = true
            elementList = append(elementList,randNum)
        }
    }

    for i:=0;i<len(elementList);i++{
        isInside := ckflter.Lookup([]byte(strconv.Itoa(elementList[i])))
        if !isInside{
            t.Errorf("%v should inside but not",elementList[i])
        }
    }
    isInside := ckflter.Lookup([]byte(strconv.Itoa(lastElement)))
    t.Logf("%v should not inside but got %v",lastElement,isInside)
}

The output of code above is

=== RUN   TestExhausted
--- FAIL: TestExhausted (0.00s)
    standardCuckooFilter_test.go:308: Last element is 1776
    standardCuckooFilter_test.go:321: 16055 should inside but not
    standardCuckooFilter_test.go:325: 1776 should not inside but got true
FAIL

Process finished with exit code 1

My solution of this is adding a backup array for all buckets in cuckoo filter, and recover when insertion failure. But this costs a lot when insertion doesn't fail.