Mechanism to drain Envoy connections

If users need to take a proxy/gateway out of commission there is no great way to do that at the moment.

The primary method for graceful shutdowns of Envoy is to use the /healthcheck/fail endpoint (See ENVOY-1990).

Calling this endpoint will lead Envoy to:

Send failed health check responses to downstream proxies
Start drain closing connections

However, we do not use Envoy's active health checking at the moment, so step 1 would do nothing. Connect proxies determine the health of upstream proxies through Consul, not through direct checks.

Consul should enable graceful connection draining somehow. Here are a few options:

Modify Consul checks on Envoy proxies to be HTTP checks rather than TCP. Listeners would have an HTTP check filter in no pass through mode. Then, whenever a user calls /healthcheck/fail, a 503 would be returned whenever Consul probes the proxy. (Not sure if stacking this HTTP check filter on a TCP listener would work)
Add a new Consul check type that is only set to passing/failing via a manual toggle. This would be similar to a TTL check, except without an interval. This check type would be used for auto-registered Connect proxies and setting it to critical would prevent Connect traffic from being sent to that proxy. Users or Consul could then send a request to /drain_listeners so Envoy sends Connection: close on HTTP request completions.
Add HTTP endpoints to /agent/checks/ that enable users to force proxy TCP checks to become critical or passing. Making a check critical would disable the check runner, and making it passing would re-enable it. The listener draining side-effect from option 2 is also an option here.

hashicorp / consul

Mechanism to drain Envoy connections #8304