Closed costrouc closed 1 year ago
Hi @costrouc, do they want a config per user structure, or they are happy to get it set in the qhub-config
?
@viniciusdc and @costrouc , we would be happy to set this in the qhub-config
.
One of the worst aspects of having the time out being so short is that any terminal sessions disappear.
Thanks for taking a look!
Folks, what would it take to enable this?
This is the number top complaint I've heard from ESIP Qhub users.
Even if it wasn't configurable and just made longer by qhub devs, that would be wonderful. Right now it must be 5 minutes, right?
It would be great if dask clusters spun down in 30 min, and notebooks spun down in 90 min or 3 hours.
Just for comparison, AWS SageMaker Studio Lab, the free notebook offering from AWS, times out after 4 hours for a GPU, 12 hours for a CPU.
Hi @rsignell-usgs, I will make sure this issue is prioritized for our next sprint (which starts next week). I can't promise it will be configurable from the qhub-config.yaml
but I will work with the team to come up with a workable solution asap. Thanks again for the reminder!!
Okay, thanks @iameskild. The users will definitely appreciate any improvement in the situation, even if not configurable!
@iameskild , I remember you showed me how to (temporarily) override the short culler settings by connecting to some pod and editing a config file, right? After the upgrade from 0.4.3 to 0.4.4, the users are screaming again about the too-short timeout for their servers.
Hey @rsignell-usgs, for now, you can manually edit the etc-jupyter
configmap if you want to make changes to the timeout settings.
Although I still have to circle back to this when I have more time but as a quick update, I was looking into using Terraform's templatefile
to make these values more easily configurable.
This can also be achieved using overrides on the jupyterhub configuration to change the idle-culling variable values. Right now, the values that can be changed are those here
jupyterhub:
overrides:
cull:
users: true
Some values come from the idle-culler extension that, as of now, only the above method can be used to update them.
To change these, I can use k9s to ssh into the hub-**
pod and then just edit them?
@rsignell-usgs yep, just edit the file. You may need to kill the hub pod for the changes to take effect.
What is the filename once I've ssh'ed into the hub pod?
Here's the workaround recipe that should modify the cull settings (at least until the next qhub/nebari version is deployed):
etc-jupyter
configmape
key to edit (make the changes below), then "esc"hub-xx
Just for the record, I set everything to 30 minutes:
# The interval (in seconds) on which to check for terminals exceeding the
# inactive timeout value.
c.TerminalManager.cull_interval = 30 * 60
# cull_idle_timeout: timeout (in seconds) after which an idle kernel is
# considered ready to be culled
c.MappingKernelManager.cull_idle_timeout = 30 * 60
# cull_interval: the interval (in seconds) on which to check for idle
# kernels exceeding the cull timeout value
c.MappingKernelManager.cull_interval = 30 * 60
# cull_connected: whether to consider culling kernels which have one
# or more connections
c.MappingKernelManager.cull_connected = True
# cull_busy: whether to consider culling kernels which are currently
# busy running some code
c.MappingKernelManager.cull_busy = False
# Shut down the server after N seconds with no kernels or terminals
# running and no activity.
c.NotebookApp.shutdown_no_activity_timeout = 30 * 60
Feature description
Currently much of the idle culler is hard coded. @rsignell-usgs brought this up as an issue that he was concerned about. The current timeout is too short in some cases.
Value and/or benefit
The default idle timeout does not work for everyone.
Anything else?
No response