twitter-archive / kestrel

simple, distributed message queue system (inactive)
http://twitter.github.io/kestrel
Other
2.78k stars 313 forks source link

Reliable Writes #138

Closed matthiasg closed 10 years ago

matthiasg commented 10 years ago

I am curious how kestrel supports reliable writes.

In the wiki it is said that:

Writing something into a queue is pretty reliable. The client does a “set” operation, and if it worked, kestrel responds “STORED”. Naturally, it only sends that response after the item has been written into the queue’s journal file. The “STORED” response means kestrel has taken responsibility for the item.

But since there is no correlation to the actual item written due to the async nature of talking to a tcp socket I am not sure that is sufficient without e.g. at least including a sequential number or an identifier.

I was looking forward to using the node-kestrel client but like many clients there isnt much they can do to guarantuee. The only thing they could possible do is impose a restriction and stall writing until they receive the STORED message back.

Example

time message or response of STORED message content
1 write a
2 write b
3 write c
4 write d
5 response STORED
6 write e
7 write f
8 write g
9 response STORED
10 response STORED
11 response STORED
12 response STORED

... continuous stream ...

at any point in time e.g. 12 its possible that the messages [a..e] have been written, but isnt it also possible that [a,c,d,e,g] have been written ? is it really ensured that the message send via e.g socket.write actually reached the tcp out queue and thus have an ensured order ?

usually i would expect a unique number/id/key/gap detection or twitter storm like ack mechanism to ensure delivery...

now obviously as mentioned before one could send a batch of x items and then stall until x STORED responses come back. If not all of them come back in the timeout period you could and would have to resend all items since you dont know which one failed.

Whats the best thing to do here ? Should this be done OOB ? I could implement an acking framework of course with a ackid inside the message with the consumer acking the messages directly to the consumer but the whole reason for the queue is NOT to keep the items in the producer until the consumers can catch up.

matthiasg commented 10 years ago

after some more investigation, i would say that kestrel only supports reliable writes when waiting for the ack before sending new messages. i.e. wait one hit to the disk, thus one should use batching when sending from one sender only. in the case of multiple senders kestrel might combine flushes internally, though i haven't checked that it does.