SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Currently, the controller node is a sington, which may cause single point of failure.
Because the controller node has the function of proxying user requests to backend serving nodes besides its management role, the failure of it may cause the caller applications fail on calling the backend services.
Multiple controller nodes are useful.
If there are multiple controller nodes, even they are independent from each other, user can take the advantage of the high availability from them by, for example, employing load balancer or leveraging DNS round-robin functions.
Currently, the controller node is a sington, which may cause single point of failure.
Because the controller node has the function of proxying user requests to backend serving nodes besides its management role, the failure of it may cause the caller applications fail on calling the backend services.
Multiple controller nodes are useful.
If there are multiple controller nodes, even they are independent from each other, user can take the advantage of the high availability from them by, for example, employing load balancer or leveraging DNS round-robin functions.