shotover / shotover-proxy

L7 data-layer proxy
https://docs.shotover.io
Apache License 2.0
84 stars 16 forks source link

Hot Reload #130

Open benbromhead opened 3 years ago

benbromhead commented 3 years ago

We want to support the ability to perform a hot reload of shotover (similar to the way HAProxy and Envoy proxy do), where we can "restart" the executable such that:

To acheive such an ability, we would like to introduce the ability for shotover to start in "handover" mode, where it will receive existing tcp socket file descriptions from a currently running shotover process and gradually take over them.

This method of socket handover between processes is best shown in the following repo: https://github.com/benbromhead/hot-reload-example. This repo also contains links to prior art and some good blog posts.

This is likely going to be a change that requires a few changes to shotover. Such as:

benbromhead commented 3 years ago

A transform chain describes a set of behavior for each connection. It determines how to handle each frame received from a tcp socket. Shotover will clone a brand new transform chain per tcp socket created. Why do we do this? Each transform can maintain its own state on a per connection basis.

So a connection will generally look like this:

TCP SOCKET -> TRANSFORM CHAIN -> UPSTREAM

For 100 connections to shotover, you would have:

TCP SOCKET_0 -> TRANSFORM CHAIN_0 -> UPSTREAM_SOCKET_0 TCP SOCKET_1 -> TRANSFORM CHAIN_1 -> UPSTREAM_SOCKET_1 … TCP SOCKET_100 -> TRANSFORM CHAIN_100 -> UPSTREAM_SOCKET_100 ... TCP SOCKET_N -> TRANSFORM CHAIN_N -> UPSTREAM_SOCKET_N

Each socket might belong to a different host, the upstream socket might connect to a different service etc. So we need to maintain the mapping of each socket to a transform chain identity.

This invariant should be true: For a transform chain that maintains strict connection identity (e.g. doesn't mix messages with messages from a different socket). a client connected to shotover on socket 0 should have upstream socket 0 when connections are handed off to another shotover process.

rukai commented 3 months ago

I think the best way to do this would be to:

  1. shotover flushes all pending requests/responses while halting the processing of new requests
    • This clears out any messages from the transforms allowing us to destroy them without data loss
  2. shotover stores sources socket fd + sources byte buffer to disk
  3. restart shotover
  4. shotover reloads old chain instances by reading the fd + byte buffers from disk.
    • The clients outgoing connections are maintained and shotover will recreate all its outgoing connections as needed.

This way the entire restart is hidden from the client, except for a (hopefully small) jump in latency. But at the same time this greatly reduces complexity of shotover as compared to full hot-reloading Requiring every transform to be capable of hot-reloading is a lot of complexity and the required flexibility would minorly reduce performance in some areas.

This approach also makes compatibility between shotover versions possible. If a transforms entire state had to be stored to disk it would be impossible for a new version of shotover that changes its internal representation of the transform to reload from an old version without extensive "upgrade procedures" for the on-disk representation.

Many DBs have some of statefulness to their connections, usually we keep track of this in the source.