Closed minrk closed 1 year ago
Do you think it's worth switching to a stateless culling process that runs separately from BinderHub, and terminates build pods older than x hours?
I'm not sure there's a benefit to it being a process vs a coroutine, but a stateless check is probably a good one. Looks like we already have that, though. Looking at implementation, BinderHub.build_max_age
configuration appears to be entirely unused, and the cleaner class is never passed configuration, so it can't be configured.
We have some builds on mybinder.org that never terminate, and are left running for several days. One example accidentally launched a notebook server during the build stage.
These should have been terminated by the max build age, so we should check why that's not happening (possibly lost watchers after binderhub restart).
We should probably also have a shorter "idle timeout" shutdown for builds, e.g. Travis stopped CI if there was no output for 10 minutes. We could be more lenient, and give it say 30, but I think it's a feature we should have on builds.