jupyterhub / team-compass

A repository for team interaction, syncing, and handling meeting notes across the JupyterHub ecosystem.
http://jupyterhub-team-compass.readthedocs.io
62 stars 33 forks source link

Split GKE federation member into new GCP project #512

Open minrk opened 2 years ago

minrk commented 2 years ago

The pressure is off a bit on the GKE funding (thanks @choldgraf!), but to alleviate issues in the future, I think we should split the 'hard to turn off' features running on GCP (federation-redirect, matomo, events archive, central grafana) from the GKE "user" (and staging?) clusters, which drive most of the cost, and are more easily turned off when we have a healthy federation.

This will make it easier to turn the GKE cluster all the way off, while leaving us with a small, manageable bill for 'core' functionality, backed by donations, stop-gap funding, etc. Cost estimates suggest this should be in the much more manageable range of hundreds of dollars/month, not thousands.

For migration purposes, I think it's best to leave the current binderhub project as the long-term one (avoids annoying matomo, events-archive, mybinder.org DNS migration), and start a new project for the federation member. With the intention that the current project never stops running and has sustainable funding via donations, etc.

Steps:

  1. [ ] create a new project (e.g. mybinder-gke)
  2. [ ] hook up the new project to the same billing account
  3. [ ] launch new cluster(s) on the new project (depending on whether staging should go here or not)
  4. [ ] turn off binderhub on the main 'gke-prod' cluster
  5. [ ] (if migrating staging) turn off staging cluster
  6. [ ] recalculate node pools on GKE-prod for reduced load
  7. [ ] delete no-longer-used images from GCR

Note: because this involves an additional cluster, our overall GKE bill will likely go up as a result. But it shouldn't be a lot, because the new 'core' cluster will have minimal requirements. But the new prod cluster will likely not have appreciably reduced requirements (prometheus, jupyterhub, binderhub are ~everything).

Two smaller alternatives:

  1. keep the project, "just" split the clusters, or
  2. keep it as-is, but make it easier to turn off binderhub

These are slightly simpler from a management perspective. The only real downsides to this that I can see are:

  1. it takes a long time to delete images from GCR, which is a nontrivial cost (also just annoying, compared to instantly disabling a GCP project)
  2. it's harder to see what our "turn off BinderHub" costs will actually be, and thus plan for if/when we need to turn off the GKE federation member
choldgraf commented 2 years ago

I like the idea of a clean separation in projects between the gke.mybinder.org binderhub, and the "core federation router" infrastructure. +1 on this proposal from me.

manics commented 2 years ago

Sounds good to me, and will also help with local testing and development of just the core infra/redirector.