harvard / cloudJHub

An implementation of JupyterHub within the Amazon cloud, with automatic scaling up and down
BSD 3-Clause "New" or "Revised" License
125 stars 14 forks source link

Resolve issues with setup_user() process in spawner #13

Closed arthurian closed 6 years ago

arthurian commented 6 years ago

This PR resolves issues with the setup_user() process in the spawner.

If the user setup process fails to complete successfully, then it's impossible for the spawner to start the remote notebook server, even though it will keep trying and failing. The spawner does not check to see if the user was setup successfully after the initial launch (e.g. when it is new), so the problem goes unfixed and the user continues to see 500 errors as a result of their notebook server not starting.

The solution proposed here does two things:

  1. Implements a is_user_setup() method to check if the user setup process completed successfully. It checks to see if the user account was created, the home directory was created, and if the permissions are valid on the home directory.
  2. Refactors setup_user() method to upload a shell script to the worker instance to perform the setup process and then executes the script with the required parameters (e.g. username and volume to mount). Note that one important property of the shell script is that it is idempotent, and will only make changes as necessary to achieve the desired end state. Running it multiple times should yield the same result as running it once successfully.

A possible next iteration of this solution is to have this shell script run as part of the userdata of the new worker instance. This would be a very clean approach, since this should only need to be run once rather than every time a worker is started. For this to work, it would be necessary to have the userdata deliberately disable the ssh server prior to running the setup process, and then enable it when finished. This would prevent the spawner from prematurely trying to start the notebook server until the setup process has completed.

The current solution has one advantage over the userdata approach, which is that it addresses existing worker instances that are in varying states with respect to user setup. Therefore, we are proposing this solution first, with the goal in mind of transitioning to userdata later.

@farassadek @joshuagetega

arthurian commented 6 years ago

Closing this PR after discussing it in person with @farassadek and @joshuagetega. The intention is to re-submit a new PR later with the userdata approach mentioned in the description above, since ultimately that is going to be the most reliable solution to setting up user accounts and mounting the volume.