jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.54k stars 388 forks source link

enforce overall pod quota on attempts to launch #1441

Closed minrk closed 2 years ago

minrk commented 2 years ago

otherwise, the quota field in the health check can only be used by outside consumers of the health endpoint (e.g. the mybinder.org federation redirector).

Without this, there's no way for a BinderHub to limit its own capacity, despite the appearance of such configuration.

minrk commented 2 years ago

Risk of doing this for mybinder.org: a race condition where a launch is assigned to a BinderHub by the federation-redirector because it's just below quota, but arrives after it passes the quota and fails rather than being sent to another member.

A way to avoid this would be to use two quotas - one soft for 'health' and one hard for rejecting launches. We could implement this only in the federation-redirector by having a 'headroom' of e.g. 5-10 pods where we consider the quota full before it's all the way full.

betatim commented 2 years ago

I like the idea of moving the "head room" decision (and related config complexity) to the redirector. With two quotas we'd have to explain to the naive, one instance BinderHub admin why there are two, why they are different and how to set them, etc. With your idea of moving this decision to the federation-redirector we keep all that complexity hidden from people who don't run a federation.