Would it be better to support concurrent insert?

seiflotfy / cuckoofilter

Cuckoo Filter: Practically Better Than Bloom

MIT License

1.13k stars 109 forks source link

Would it be better to support concurrent insert? #18

Open seaguest opened 6 years ago

seaguest commented 6 years ago

Hello,

If we have huge data to process, we may need a lot insert, but I didn't see any mutex in code, would it be better to support concurrent insert, or is there any other reason for not?

seiflotfy commented 6 years ago

I'd have to set a mutex into the insertion call, it might as well be done by the caller... WDYT?

seaguest commented 5 years ago

I think it would be better to integrate the mutex inside the filter to make it a complete module, with which people can use without dealing with concurrent write problem. Anyway, it's just a matter of choice.

mholt commented 5 years ago

It'd be interesting to see what the performance tradeoff is; some applications may not need thread safety, and a mutex might slow them down? I dunno.

Anyway, here's my solution; it's very simple and elegant:

type concurrentCuckoo struct {
    *sync.Mutex
    *cuckoo.Filter
}

cf := concurrentCuckoo{
    Filter: cuckoo.NewFilter(1000000)
    Mutex:  new(sync.Mutex)
}

cf.Lock()
cf.InsertUnique([]byte("hello!"))
cf.Unlock()

Side-note: It would be nice to reduce typing repetition in this package a bit. I'll submit another issue.

mark-kubacki commented 5 years ago

FYI, a mutex will be a congestion point on HCC machines (like, 12 cores and up).

I'd leave the choice of guard to the data structure user, and not integrate it.

cben commented 3 years ago

Is it sufficient to do mutex only on writes? Are reads safe while insert is potentially shuffling things? Or does this need a read-write lock to allow concurrent reads when not doing writes?

(if leaving locking to callers, this is a point that's good to document)