hyperium / tonic

A native gRPC client & server implementation with async/await support.
https://docs.rs/tonic
MIT License
9.35k stars 957 forks source link

`tonic-health`: add `NOT_SERVING` override for all registered services #1693

Open kriswuollett opened 2 months ago

kriswuollett commented 2 months ago

Feature Request

Add ability to mask all registered services as NOT_SERVING despite if it individually has state SERVING.

Crates

tonic-health

Motivation

The HealthReporter implementation does not easily support the use case of "draining a server" to inform a load balancer to stop sending requests to a particular instance of a server during the shutdown grace period, it only has functionality to change the state on one service at a time. For context for anyone not familiar with the issue, here is a troubleshooting guide for Google's GCP load balancer for example.

So basically, make it easy to say everything is not serving. For example in Kubernetes, a loadbalancer could route to multiple Services for gRPC which could all happen point to the single tonic Deployment . When the deployment's Pod is being restarted all services need to drain at the same time.

Proposal

Refactor HealthReporter so that a not-serving flag is also RW-locked to override the status of any registered service when requested.

Basically usage would be like the following, but I have no strong opinion about the actual function names:

health_reporter
    .override_all_not_serving()
    .await;

health_reporter
    .clear_override_all_not_serving()
    .await;

Alternatives