jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
190 stars 134 forks source link

Message when not enough resources are avalilable #121

Open miguelmarco opened 6 years ago

miguelmarco commented 6 years ago

We have has some problems when users tried to spawn a session, slurm didn't have enough resources to allocate then.

Would it be possible to show a message stating that it is the case? Ideally, the user would receive the message and the job would not be kept waiting in the queue.

Do you think this makes sense to be included?

rkdarst commented 6 years ago

da5a2f9 added some slightly better messages when a job disappears from the queue with no warning. However, there could be another message when jupyterhub self-cancels it when there are no resources available on time.

I think what you are suggesting is a message that could be displayed if there were not enough resources available and then batchspawner can cancel itself - or it least warn the wait will be long and you will probably time out waiting to spawn. This would be nice, we'd need a hook where the spawner classes can return some message while waiting. We should add this to our to-do list. For proper resolution, #86 should be solved first, which might require dropping pythen3.4 support unless someone more clever than me can fix it.

miguelmarco commented 6 years ago

Thanks,

in general, I think it could also be useful to show what are the available resources in the machine in that moment when the user chooses the profile. Or even better: that he profiles that don't have enough resources to be run in that moment are not even displayed (althought that might not be accurate if some job is started right in the moment where the user has logged in but has not spawned his server yet).

rkdarst commented 6 years ago

Also to do (just came up for me): When there is a reservation (e.g for maintenance) and jobs can't finish before maintenance, then it won't run at all, and JH will cancel the job once spawn_timeout gets exceeded. This should be captured and a special message should appear, instead of the default "spawn failed". It could give some hints on what the problem could be.

jd-daniels commented 5 years ago

+1 on this, our users see the timeout and think something is wrong with the system, not that resources are unavailable. Based on the linked PR, looks like this is pending on dropping support for python 3.4 then? If there is something I can take a look at, let me know.