Open samsp-msft opened 2 years ago
It may be interesting to allow different implementations of the Config Service or Health Check service.
For example, we're planning on using Orleans to run health checks in the cluster (but only one instance of each health check) without needing to manage extra services. We haven't yet implemented, but could implement further segmentation by availability zone or other failure zone by adding that to the Orleans grain id for the health check.
Ex. all us-west2-a
LBs hit the <route>/us-west2-a/HealthCheck
grain and similar for us-west2-b
-> <route>/us-west2-b/HealthCheck
. If the segmentation (AZ in this example) is flexible then any segmentation can be provided for a particular LB.
An Orleans based implementation may not be provided out of the box, but we'd like to be able to utilize a similar implementation behind an abstraction if possible.
One of the reasons we have integrated with Orleans is that it is also useful for things like the Config Server (have a single point of computation for merged routes/clusters from multiple k8s clusters) and we also use it for rate limiting across a collection of load balancers. If you're running proxies at scale, you have to solve the distributed system problems somehow; Orleans is how we're solving it without needing to implement the whole distributed foundation from scratch.
Consider #267 where the destinations have scheduled downtime, that that health check system could push out that data to proxies.
Feedback from a 1P team - For Http/2 the health checks ensure that there are warm connections to all destinations.
edit that's not HTTP/2 specific, it also helps for HTTP/1.x.
What should we add or change to make your life better?
YARP can do active health checks against backend servers to make sure that they are able to respond successfully to requests. In the case of having a number of YARP proxy instances, and a large number of backends, each YARP instance will need to ping each back end for health checks. As the number of servers grows, the health checks will also grow, potentially exponentially.
For scale out scenarios, YARP should have the ability to run the health check as a separate service. That should be runnable on a limited number of servers, which will perform the health checks and then provide the data to other YARP instances.
Concept
YARP includes a consolidated health check service which can be configured to run on a server or two. This service will talk to the configuration server #1710 to understand the cluster and destination definitions. It will perform the active health checks against the servers based on the URL definitions in config.
The configuration service will act as a broker, enabling instances to discover each other:
Proposal
Consolidated health checks will be dependent on having a configuration server. This features value is mostly when used in a scale out scenario where there are multiple YARP instances. The configuration server will provide the orchestration of YARP instances knowing about the health check server, and also for the health check server knowing about the configuration of the clusters and destinations.
The configuration server will include configuration data about the health checks as part of the configuration that is exposed via rest endpoints, and notifications in either direction about a specific destination's health.