sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

Disable node auto-upgrades for sourcegraph.com #4779

Open beyang opened 5 years ago

beyang commented 5 years ago

Disable node auto-upgrades for sourcegraph.com and implement a manual upgrade schedule. Auto-upgrades result in downtime (most recently: https://sourcegraph.slack.com/archives/C0J618TTM/p1562100477015600, which resulted in 7 minutes of downtime). This is disruptive to the person on-call and results in a poor experience for users.

@ggilmore can we disable auto-upgrades and move to a manual update schedule?

slimsag commented 5 years ago

https://cloud.google.com/kubernetes-engine/docs/how-to/maintenance-window

We get to choose a maintenance window and @ggilmore currently has it set to SF timezone so the on-call person is likely awake during that time.

These upgrades are not things we should take lightly, they contain important security fixes. So, if we disable these, what is our plan to ensure we are on-top of upgrades and rolling them out in a similar timeframe (<24hr)?

beyang commented 5 years ago

The maintenance window is daily, which might be too frequent for us. What do you think about doing it weekly or even monthly. Critical updates will still be applied automatically:

GKE reserves the right to roll out unplanned, emergency upgrades outside of maintenance windows. Additionally, mandatory upgrades to upgrade from deprecated or outdated software might automatically occur outside of maintenance windows.

Whatever we do, the on-call person must be made aware of the maintenance window (whether manual or automatic), so they are not caught off-guard.