grandecola / bigqueue

Embedded, Fast and Persistent bigqueue implementation
MIT License
443 stars 32 forks source link

Allow flushing periodically #41

Closed mangalaman93 closed 5 years ago

mangalaman93 commented 5 years ago

We should expose the flush function as part of the bigqueue interface. Additionally, we should not trust the OS periodic syncing, and instead, enable flushing periodically, with a timer or probably by amount of data change, with configuration parameters to choose the period.

mangalaman93 commented 5 years ago

Reference: https://www.reddit.com/r/golang/comments/9vyu65/simple_embedded_persistent_fifo_queue_for_go/

rohansuri commented 5 years ago

Hi Aman, I'd like to take this up. I've been thinking of the implementation.

Flushing after a fixed number of enqueues is straightforward and can be done on the enqueue code path itself calling flush over all arenas.

However for the other flush strategy, since it'd involve a concurrent time.Ticker I think the concern is whether sys_msync call is thread safe w.r.t to the single concurrent writer.

The man page says nothing along those lines. Do you know if it is thread safe?

If it is, we could even fire go routines for all arenas to flush concurrently as well.

Thanks

mangalaman93 commented 5 years ago

sounds good. For now, given bigqueue in not thread safe, it would be okay to use the same Enqueue (may be Dequeue as well) to check for timer completion event.

mangalaman93 commented 5 years ago

We might want to keep a dirty flag for each arena to identify which ones to flush. Doing them concurrently might be fine too, though, I wonder whether the cost of creating go routines would be more than calling the flush syscall.

rohansuri commented 5 years ago

Nice idea. Although msync would only sync the changes, having an in-app dirty flag would even avoid the system call.

You're right, if the go scheduler creates an OS thread for each of those blocking flushes then we might want to pool them. I'll write a benchmark and see, but let's keep this issue simple by doing everything on the enqueue/dequeue code path.

I'll get a PR ready.

ashish-goswami commented 5 years ago

Another Reference: https://github.com/dgraph-io/badger/issues/526