Open jkonecki opened 7 years ago
A possible alternative to updating all apps in an update domain would be for Yams to update an app in any update domain that is not currently in the process of updating an app, while maintaining a minimum of update domains available, setup for the entire cluster (you know how many you have and how many would be the minimum for you presumably). Still do custom probes for Azure LB as you suggest. That way updating an app is not subject to the slowness of updating other apps, but only subject to how many apps are updated at the same time. Fairness. Thoughts?
This issue encompases the work necessary for the YAMS host to support load balancing during application deployment.
The current behaviour is as follows: once the configuration change is detected YAMS host will process each affected application in parallel, updating one Update Domain of each application at a time. There is no synchronization between applications so it is likely that while App 1 is being updated in Update Domain 1 at the same time App 2 is being updated in Update Domain 0. This guarantees the fastes possible deployment of all applications but also means that many/all Update Domains may be affected at the same time.
Currently there is no support for Load Balancing so while App 1 is being updated in Update Domain 1 the nodes in this UD are considered online and will continue to receive requests.
The desired behaviour is for YAMS host to notify Load Balancer that a certain Update Domain is being updated and take the relevant nodes offline so Load Balancer can route traffic to remaining nodes in different Update Domains. In case of Azure LB this can be achieved by implementing custom probe (https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-custom-probe-overview).
There is however one issue that may affect cluster availability: with the current deployment startegy that allows multiple Update Domain to be updated at the same time there is a risk that multiple / all Update Domains are actually brought offline. In order to aviod it I propose the following deployment starategy is implemented:
The above strategy guarantees that only nodes in one Update Domain are offline at the time. The only side effect is that in case of the hosting multiple applications with different deployment times the time of updating whole cluster is deployment time of the slowest application x number of Update Domains.
I would like to receive feedback from anyone who would like to keep the exising deployment startegy (not using Load Balancer and allowing traffic to be send to unavailable nodes). If anyone is interested we may keep supporting it but as this will require additional work some convincing will be required.
The design of the solution
A new interface
ILoadBalancer
will be introduced.Two implementations of
ILoadBalancer
will be provided:NoLoadBalancer
andAzureLoadBalancer
. The former will return completed tasks and can be used if load balancing is not needed. The latter will use custom probes to notify Azure Load Balancer of node availability.The
ApplicationInstaller
will be extended to takeILoadBalancer
as a dependency.