jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
190 stars 134 forks source link

Interaction between batchspawner.api and profilespawner #127

Closed dstndstn closed 5 years ago

dstndstn commented 6 years ago

Hi,

I'm not sure if this should be a batchspawner issue or a profilespawner issue. I'll file an issue on profilespawner also.

I was finding that the batchspawner API handler was getting called by the slurm-launched worker, but the batchspawner.poll() process was not noticing it.

Digging into the API handler, I saw that the user.spawner object is a:

Spawner: <wrapspawner.wrapspawner.ProfilesSpawner object at 0x1555498bf978>

and therefore, setting the .current_port member does not put the port number in the right place for batchspawner.poll() to notice it.

I added the following, here https://github.com/jupyterhub/batchspawner/blob/master/batchspawner/api.py#L12 , and it works (but is ugly):

        spawner = user.spawner
    try:
            from wrapspawner import WrapSpawner
            if isinstance(spawner, WrapSpawner):
        spawner = spawner.child_spawner
    except:
            pass
    spawner.current_port = port

Now, it seems like a cleaner solution might be for WrapSpawner to propagate some setattr requests to its child_spawner?

Open to suggestions for a clean fix. Thanks!

dstndstn commented 6 years ago

here is the wrapspawner issue I filed: https://github.com/jupyterhub/wrapspawner/issues/24

PS, I should have added: I have a (Slurm) BatchSpawner wrapped in a ProfileSpawner, for a typical academic compute cluster use case -- users can either start their notebook processes on the head node, or on a compute node via Slurm.

mbmilligan commented 6 years ago

Thanks @dstndstn for the feedback. In case you weren't aware, the functionality you are testing was merged into master only a couple of weeks ago, so this testing is very valuable to us! That said, I don't have an immediate answer for why you are seeing this interaction or the issue you reported in #126 . In this case, I will need to look at whether this is something WrapSpawner should be propagating, or if there is another mechanism failing. Thanks in advance for your patience, and do note that if you don't need the latest features, the current release on PyPI is well tested.

rkdarst commented 5 years ago

I think that this can be closed now, becasue a fix has been added. There may be more remaining issues, but they can be dealt with later

Jon-Lillis commented 3 years ago

Hi all, I’m currently in the process of installing Jupyterhub with the goal of better utilising Jupyter on a Slurm cluster, but I’m encountering the same problem documented both here and https://github.com/jupyterhub/wrapspawner/issues/24. For context, I am also using batchspawner with profilesspawner.

The port sent by the single-user server when spawned is being received by the batchspawner API endpoint correctly, but the value is being sent to user.spawner - an instance of profilesspawner, rather than spawner.child_spawner - an instance of batchspawner. This leaves batchspawner waiting for a port that it will never receive and causes the launch to time out.

By implementing @dstndstn's patch in the API endpoint that sets the port attribute of spawner.child_spawner rather than spawner itself, the single-user session spawns correctly and the user is redirected to the lab interface as expected.

I’ve attached two config files – one, a basic configuration without the patch that leads to the problem and the other a basic configuration with the patched API endpoint that solves the problem. I've also attached logs generated when Jupyterhub is run with the patch - JupyterHub log, Single-user log, and logs generated when run without - JupyterHub log, Single-user log, and an export of my Conda environment for reference.

Can anyone suggest how this can be solved without a temporary patch to the API handler? Could this simply be a misconfiguration on my part, or would it require more in-depth modifications to batchspawner itself?

Also, with the temporary patch in place and the server spawning correctly, the error ‘RuntimeError: No child spawner yet exists - can not get progress yet’ is still present in the JupyterHub output immediately after submitting a spawn job. Is it safe to assume that this is harmless, or should this be addressed too?

Thanks in advance.