microsoft / Yams

YAMS (Yet Another Microservices Solution) is a library that can be used to deploy and host microservices in the cloud (e.g. Azure) or on premises
Other
201 stars 63 forks source link

Support for load balancing #43

Open jkonecki opened 7 years ago

jkonecki commented 7 years ago

This issue encompases the work necessary for the YAMS host to support load balancing during application deployment.

The current behaviour is as follows: once the configuration change is detected YAMS host will process each affected application in parallel, updating one Update Domain of each application at a time. There is no synchronization between applications so it is likely that while App 1 is being updated in Update Domain 1 at the same time App 2 is being updated in Update Domain 0. This guarantees the fastes possible deployment of all applications but also means that many/all Update Domains may be affected at the same time.

Currently there is no support for Load Balancing so while App 1 is being updated in Update Domain 1 the nodes in this UD are considered online and will continue to receive requests.

The desired behaviour is for YAMS host to notify Load Balancer that a certain Update Domain is being updated and take the relevant nodes offline so Load Balancer can route traffic to remaining nodes in different Update Domains. In case of Azure LB this can be achieved by implementing custom probe (https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-custom-probe-overview).

There is however one issue that may affect cluster availability: with the current deployment startegy that allows multiple Update Domain to be updated at the same time there is a risk that multiple / all Update Domains are actually brought offline. In order to aviod it I propose the following deployment starategy is implemented:

  1. Nodes in Update Domain 0 are taken offline from Load Balancer
  2. YAMS host upgrades all applications in Update Domain 0
  3. If applications support monitored deployment feature the host waits for the application initialization to complete
  4. Nodes in Update Domain 0 are brought online
  5. Steps 1-4 are repeated for remaining Update Domains one at the time

The above strategy guarantees that only nodes in one Update Domain are offline at the time. The only side effect is that in case of the hosting multiple applications with different deployment times the time of updating whole cluster is deployment time of the slowest application x number of Update Domains.

I would like to receive feedback from anyone who would like to keep the exising deployment startegy (not using Load Balancer and allowing traffic to be send to unavailable nodes). If anyone is interested we may keep supporting it but as this will require additional work some convincing will be required.

The design of the solution

A new interface ILoadBalancer will be introduced.

public interface ILoadBalancer
{
    Task TakeOffline();
    Task BringOnline();
}

Two implementations of ILoadBalancer will be provided: NoLoadBalancer and AzureLoadBalancer. The former will return completed tasks and can be used if load balancing is not needed. The latter will use custom probes to notify Azure Load Balancer of node availability.

The ApplicationInstaller will be extended to take ILoadBalancer as a dependency.

remusrg commented 7 years ago

A possible alternative to updating all apps in an update domain would be for Yams to update an app in any update domain that is not currently in the process of updating an app, while maintaining a minimum of update domains available, setup for the entire cluster (you know how many you have and how many would be the minimum for you presumably). Still do custom probes for Azure LB as you suggest. That way updating an app is not subject to the slowness of updating other apps, but only subject to how many apps are updated at the same time. Fairness. Thoughts?