twitter-archive / kestrel

simple, distributed message queue system (inactive)
http://twitter.github.io/kestrel
Other
2.78k stars 313 forks source link

Archived Journal Files Do Not Compact as Documented #132

Open DavidFishman opened 10 years ago

DavidFishman commented 10 years ago

Reference line 204 of PersistentQueue.scala if ((journal.size >= config.defaultJournalSize.inBytes && queueLength == 0))

Looking at the implementation of Journal, I see that journal.size is just the size of the current journal file - not including the size of archived journal files.

As a result, the journal may be rotated enough times that totalJournalSize > defaultJournalSize, yet by the time queueLength = 0 the currentJournalFileSize is actually less than the defaultJournalSize, and no compacting occurs despite the the journal size having exceeded the maxJournalSize.

This is inconsistent with the documentation, stating

The journal file is compacted if the queue is empty and the journal is larger than defaultJournalSize.

This creates a situation where archived journal files may never get cleaned up until the total journal size (journal.size + journal.archivedSize) exceeds maxJournalSize, or defaultJournalSize = 0.

This may have also been what caused issue 124 (https://github.com/robey/kestrel/issues/124).

I believe the fix is to change the logic in PersistentQueue.checkRotateJournal to be if (((journal.size + journal.archivedSize) >= config.defaultJournalSize.inBytes && queueLength == 0))

I'm happy to submit a pull request if you agree.

Thanks, David

DavidFishman commented 10 years ago

To further illustrate the issue (and explain why it may not have been caught given the default configuration):

With a default of maxMemorySize=128mb and defaultJournalSize=16mb, we have a low risk of hitting this case as we're likely to have a current journal file size that is greater than 16mb by the time the queue length = 0 (since rotation occurs when the current journal file size is greater than the maxMemorySize). However, as the difference between these numbers decreases, we become more likely to to have a current journal file that is less than the defaultJournalSize when the queue length = 0.

More so, if defaultJournalSize>maxMemorySize the journal file will never be compacted when the queue length = 0, (since rotation would occur at a file size less than the defaultJournalSize, making it impossible for the current journal file size to ever be greater than the defaultJournalSize).

eric commented 10 years ago

:+1: Thanks for tracking this down! I've observed the behavior before but haven't had a chance to dig into why.