Why is PersistentQueue.fillReadBehind() called synchronously?

twitter-archive / kestrel

simple, distributed message queue system (inactive)

http://twitter.github.io/kestrel

Other

2.77k stars 312 forks source link

Why is PersistentQueue.fillReadBehind() called synchronously? #46

Open eric opened 13 years ago

eric commented 13 years ago

I noticed that PersistentQueue._remove() is synchronously calling fillReadBehind() which means the thing fetching the message will not return until the IO has been done to refill the message being dequeued.

Is there a reason why this is not done in the background?

I would expect if this was done separately it would allow for the messages in memory to be drained faster and would allow for more bulk IO operations, which I would expect the OS would be able to handle more efficiently.

robey commented 13 years ago

yeah, that might be true. i think the only reason it happens synchronously right now is for simplicity. in theory, each "get" in read-behind mode will lead to approximately one item read back in from disk, but there's no reason to delay client responses for it.

it could work like the journal-packer thread, just receiving work for refilling queues, and would have the benefit of ensuring that only one queue is being refilled at a time.

eric commented 13 years ago

Cool. I've recently been bitten by how RabbitMQ handles a queue getting very far behind so I'm very concerned with performance in this degraded case. I really appreciate the way the hard memory limit works but I'm interested in making sure it does not become slower to dequeue once things get backlogged (or it may become impossible to dig your way out again).

eric commented 13 years ago

Have you experimented with using a BufferedInputStream for reading the journals to reduce the number of reads syscalls that are happening in readJournalEntry()? For sequential reading like this it seems like it could be a big win.

eric commented 13 years ago

I just realized that BufferedInputStream does not have a getChannel() method, so it wouldn't be possible to use it with the channel operations...