Server crashes when trying to produce large packets because of buffer overflow

From @pfons on April 13, 2016 0:11

We found a bug that causes a buffer overflow on the leader when a lagging follower tries to recover. The stack overflow seems to occur within the recursive function “restore_from_log” (Shim.ml) when a very large packet is constructed and before the leader actually tries to send it.

This problem can be reproduced through the following process: a) start 3 servers; b) execute one client request; c) stop a follower server; d) execute many client requests (in our tests, at least 521,932 requests). c) restart the server that was stopped

Here’s a sample output produced by the leader when it crashes:

   [Term 1] Sending 50 entries to 2 (currently have 521932 entries), commitIndex=521882_
   [Term 1] Sending 521881 entries to 3 (currently have 521932 entries), commitIndex=521882_
   [Term 1] Received AppendEntriesReply 50 entries true, commitIndex 521883
  Fatal error: exception Stack overflow

Copied from original issue: uwplse/verdi#37

uwplse / verdi-raft

Server crashes when trying to produce large packets because of buffer overflow #49