Hot Reload - Githubissues

benbromhead commented 3 years ago

We want to support the ability to perform a hot reload of shotover (similar to the way HAProxy and Envoy proxy do), where we can "restart" the executable such that:

Existing TCP connections from clients are not dropped.
Existing TCP connections to upstream servers are not dropped.
Configuration changes are reflected.
Can be used to upgrade between binary minor versions (e.g. no breaking API / implementation changes).

To acheive such an ability, we would like to introduce the ability for shotover to start in "handover" mode, where it will receive existing tcp socket file descriptions from a currently running shotover process and gradually take over them.

This method of socket handover between processes is best shown in the following repo: https://github.com/benbromhead/hot-reload-example. This repo also contains links to prior art and some good blog posts.

This is likely going to be a change that requires a few changes to shotover. Such as:

The ability to map transform chains and associated connections between config changes.
The need for transforms to pick up a tcp connection that has already been authenticated etc.
When an FD is transferred to the new shotover process, no inflight messages are lost (no data loss).

benbromhead commented 3 years ago

A transform chain describes a set of behavior for each connection. It determines how to handle each frame received from a tcp socket. Shotover will clone a brand new transform chain per tcp socket created. Why do we do this? Each transform can maintain its own state on a per connection basis.

So a connection will generally look like this:

TCP SOCKET -> TRANSFORM CHAIN -> UPSTREAM

For 100 connections to shotover, you would have:

TCP SOCKET_0 -> TRANSFORM CHAIN_0 -> UPSTREAM_SOCKET_0 TCP SOCKET_1 -> TRANSFORM CHAIN_1 -> UPSTREAM_SOCKET_1 … TCP SOCKET_100 -> TRANSFORM CHAIN_100 -> UPSTREAM_SOCKET_100 ... TCP SOCKET_N -> TRANSFORM CHAIN_N -> UPSTREAM_SOCKET_N

Each socket might belong to a different host, the upstream socket might connect to a different service etc. So we need to maintain the mapping of each socket to a transform chain identity.

This invariant should be true: For a transform chain that maintains strict connection identity (e.g. doesn't mix messages with messages from a different socket). a client connected to shotover on socket 0 should have upstream socket 0 when connections are handed off to another shotover process.

rukai commented 5 months ago

I think the best way to do this would be to:

shotover flushes all pending requests/responses while halting the processing of new requests
- This clears out any messages from the transforms allowing us to destroy them without data loss
shotover stores sources socket fd + sources byte buffer to disk
restart shotover
shotover reloads old chain instances by reading the fd + byte buffers from disk.
- The clients outgoing connections are maintained and shotover will recreate all its outgoing connections as needed.

This way the entire restart is hidden from the client, except for a (hopefully small) jump in latency. But at the same time this greatly reduces complexity of shotover as compared to full hot-reloading Requiring every transform to be capable of hot-reloading is a lot of complexity and the required flexibility would minorly reduce performance in some areas.

This approach also makes compatibility between shotover versions possible. If a transforms entire state had to be stored to disk it would be impossible for a new version of shotover that changes its internal representation of the transform to reload from an old version without extensive "upgrade procedures" for the on-disk representation.

Many DBs have some of statefulness to their connections, usually we keep track of this in the source.

This per protocol statefullness will also need to be serialized to disk.
If there is any statefulness outside of the source we will need to move it into the source.

shotover / shotover-proxy

Hot Reload #130