Closed colin-scott closed 9 years ago
Great catch as always, thanks @colin-scott! I might actually want to fix this right away, should be easily testable in isolation.
This is actually a bit weird, so allow me to re-cap the exact case the fuzz test found. Do you mean that the same Leader sends those messages (same Term)?
I don't think there is re-delivery of [1] implemented, and at-worst the [1,2] could be re-delivered...
Or are we talking about 2 different leaders sending those 2 messages? Then it seems like the "longest log wins" should be applied, no? I'll dig into it (need to finally get back to hakking on this project, so that's a good reason to get me remember the impl and algorithm ;))
You're absolutely right that this would only be triggered if you moved to UDP instead of TCP -- TCP guarentees in-order delivery, and this bug depends on a delayed message falling behind a later message between the same sender and receiver.
It's possible that you could trigger this with 2 different leaders? Not sure. But my fuzz test only involved a single leader sending the AppendEntries messages.
*Is it Akka that guarantees in-order delivery, or is it TCP?
Interesting, akka does appear to guarentee in-order delivery:
I find that to be an interesting design choice. If you ever support UDP akka-remoting, that guarentee comes with a non-trivial performance cost.
Ah, and in fact, there is a discussion on that same page about possibly removing that guarentee in the future:
Well, my fuzzer does support an in-order delivery constraint, so I guess I'll switch to that for the time being.
Correct - currently remoting is over TCP, and the ordering guarantee of "direct point to point communication" holds there. We are not currently looking into supporting UDP directly (however are very tempted by https://github.com/real-logic/Aeron which does works over UDP however is viewed as a log as well, and that log is ordered).
For reporting problems in akka-raft let's use the in-order delivery mode then, which also means we can close this specific ticket right? :)
Hi @colin-scott, we talked about the docs and current implementation in the 2.3 and 2.4 series of Akka. The doc is slightly out of date - we'll update it soon https://github.com/akka/akka/issues/18209
The guarantee holds mostly because of TCP, however since there can be re-established connections Akka does actually have to to a bit more than just rely on TCP (if a new connection comes in, while the old one is still not closed (i.e. "stale")) we ignore messages from the stale connection. Yes, it wouldn't hold for UDP.
Yeah, feel free to close this issue
Yup, done :)
Found through fuzz testing:
Suppose a follower with an empty log receives an AppendEntries containing 2 entries. The follower appends these to its log.
Then the follower subsequently receives an AppendEntries containing only the first of the previous 2 entries. [This message was delayed].
Currently, the follower will inadvertently delete the 2nd entry from its log.
This is not just a performance issue: it can cause raft to violate the "Leader Completeness" safety property: the leader was under the impression that the follower had 2 entries in its log, and may have decided to commit both entries. If another leader then overtakes it, it will not necessarily have both committed entries in its log.