m-lab / k8s-support

Setup for the kubernetes systems to control and run all the MLab nodes around the world
Apache License 2.0
10 stars 10 forks source link

Implement production canary releases #268

Closed nkinkade closed 5 years ago

nkinkade commented 5 years ago

We currently only have sandbox -> staging -> production. staging, for now, only get 0.001 of production traffic, which isn't really enough to compare a staging release with an existing production node to be sure that a deployment is working as expected. We need to have a true production canary release cycle where changes made to staging are deployed to some subset of production machines for a period.

nkinkade commented 5 years ago

It turns out that implementing this by using versioned DaemonSets is going to require a good deal more thought. Implementing versioned DaemonSets is pretty easy, as PR #276 demonstrates, but that PR does not take into account deployment details. Specifically these questions need to be answered, and perhaps even others:

nkinkade commented 5 years ago

The thing to do in the short term may be to implement something along the lines of our original idea of just having two DaemonSets for NDT, a production one and a canary one. But even this needs more consideration to avoid some of the same pitfalls as above.

nkinkade commented 5 years ago

For now, it has been decided to just implement this by directing more production traffic to staging via manually tweaking the ReverseProxyProbability in mlab-ns after a merge to master. Once we are confident that the release is fine, we will reset ReverseProxyProbability and then release to production.