twitter-archive / kestrel

simple, distributed message queue system (inactive)
http://twitter.github.io/kestrel
Other
2.77k stars 312 forks source link

Sync Issue in PersistentQueue on I/O Exception? #21

Closed ebarlas closed 12 years ago

ebarlas commented 14 years ago

Upon close inspection of the PersistentQueue class, it occurred to me that if an I/O Exception is raised at certain points, the in-memory queue may become out of sync with the journal. For example, this can occur in add if an I/O Exception occurs on journal.add after the item has been added to the in-memory queue. Similar behavior exists in remove. Is this an accurate reading of the code? If so, what is the reasoning behind it? Thanks.

robey commented 14 years ago

it looks like an i/o exception would bounce out to the handler and possibly disconnect the client. i guess we should catch exceptions when writing the journal, and kill the server if they happen, so that queues don't get into this weird state if the disk fills. does that sound okay?

ebarlas commented 14 years ago

Hmm, possibly. The best approach, I suppose, would be to rollback journal operations, but it seems to me that simply isn't possible with the current system. Another approach is to place I/O operations ahead of in-memory data structure operations to raise I/O exceptions before modifying the queue, transaction table, or other PersistentQueue data. That should keep the PersistentQueue in a consistent state. Yet another approach is to close and reopen the queue on I/O Exceptions, however this may seemingly result in a huge number of journal reads as the journals are replayed. Perhaps this is just something to be aware of and need not be addressed?

ebarlas commented 14 years ago

Thoughts?

robey commented 14 years ago

i think you're right that it shouldn't try to continue as if nothing happened.

i'm leaning toward catching i/o exceptions inside the journal code, and writing a fatal log message and calling system.exit. it would be an unambiguous signal that something has gone wrong with the machine, and i think if the machine is hosed, kestrel shouldn't try to paste over it.

ebarlas commented 14 years ago

Okay, that does seem reasonable. One problem is that it might adversely affect folks using Kestrel as a library since the proposed fix would shutdown the JVM.