vmware-archive / haret

A strongly consistent distributed coordination system, built using proven protocols & implemented in Rust.
461 stars 18 forks source link

Garbage collect the log #95

Closed andrewjstone closed 7 years ago

andrewjstone commented 7 years ago

Each actor maintains a complete log in memory along with the vertree (output of the statemachine). This is wasteful of memory. There needs to be GC of the log. This affects the protocol, but it is well understood and documented in the paper. If a replica gets far behind the the GC point, in order to catch up via state transfer it will need to receive the actual vertree and then any later uncommitted log entries, since all log entries aren't available for it to replay.

This implies that there is an efficient way to transfer the vertree. Right now it it would all have to occur in a single message. However, for large trees it will need to be chunked and distributed in a stream. Luckily, the design of vertree allows reading and building up the trees incrementally so the whole thing won't have to be buffered in memory. However, the chunking part is not yet written.