jupyterhub / the-littlest-jupyterhub

Simple JupyterHub distribution for 1-100 users on a single server
https://tljh.jupyter.org
BSD 3-Clause "New" or "Revised" License
1.03k stars 339 forks source link

Proposal to minimize what TLJH installs in the user environment #872

Closed consideRatio closed 1 year ago

consideRatio commented 1 year ago

We are installing the following in the user environment, but I think we should minimize this for various reasons.

https://github.com/jupyterhub/the-littlest-jupyterhub/blob/7fd97ad5e02367cfe73381c19426336308099c93/tljh/requirements-base.txt#L1-L19

I think we should install what jupyterhub needs to authenticate users, the jupyterhub package, but everything else is something I consider relevant for discussion.

  1. Should we install any Jupyter UI?
  2. Should we install any additional things like nbgitpuller, jupyter-resource-usage, and ipywidgets?

I lean towards 1) no and 2) no currently, but still help users get setup with a basic environment either via documentation only, but perhaps also a prompt of yes/no for installing for example jupyterlab.

manics commented 1 year ago

I think we should always install a frontend, i.e. the latest JupyterLab release. I don't have an opinion on the other packages.

@yuvipanda what are your thoughts?

consideRatio commented 1 year ago

I think we should always install a frontend, i.e. the latest JupyterLab release

Install, and upgrade, or just install once?

I think we should make it easy to get setup with one UI, like jupyterlab, but that we should be hands off for upgrades of tljh going onwards. Like an initial assistance to setup a user environment. If someone wants to not install jupyterlab and go rstudio directly, i think that should be fine without being forced to install jupyterlab, for example by allowing them to sa "no" to a prompt during initial setup.

Not disagreeing with you about facilitating setup of jupyterlab @manics, but there is nuance. Do you think it should be forced and/or something we also upgrade over time in the user environment, or rather something users opt-in to by default on initial setups?

yuvipanda commented 1 year ago

Thanks for opening this and thinking hard about the maintenance concerns, @consideRatio.

For useful context, https://words.yuvi.in/post/the-littlest-jupyterhub/ is the originating post for TLJH, and I believe fundamentally it sits in the place of 'I am now forced to be a sysadmin, but do not want to'. So as much as possible, TLJH needs to be usable at a basic level as much as possible immediately as installed. The current setup was achieved via in-person user testing, where I got someone who constantly finds themselves in this situation, and adjusted until it felt right.

I think given that need, I think the current set of packages (minus nteract) is what I would consider minimum for someone to recognize as JupyterHub. So I'd like to leave them as is. RStudio is currently not possible to setup on TLJH and I'm ok with that.

If the problem is that of bumping versions, then we can explore doing depandabot type stuff. But I'd really like it to not require further setup to get a basic notebook working after installation.

I can provide individual justifications for the other packages required if you would like as well.

CagtayFabry commented 1 year ago

Personally I would appreciate any option to keep the base environment as clean as possible :+1:
(I tend to completely hide the base kernel from users as well and put everything into a dedicated working kernel)

I can see the appeal of a "ready-to-go" configuration directly after installing but having a clean environment makes the sysadmin side so much easier

maybe just a few more options (if wanted) to provide to the installer could be a way?

yuvipanda commented 1 year ago

I think the question is really one of defaults - folks should be able to uninstall whatever they wish. We don't put any scientific packages in there - just UI stuff that is all (IMO) minimum necessary for people who don't understand the ins and outs of how this works.

manics commented 1 year ago

Install, and upgrade, or just install once?

Originally I was thinking install the user environment once, and only upgrade packages if they're no longer compatible. That's still tricky to get right though, in theory the conda metadata should handle everything (install jupyterhub-singleuser==JUPYTERHUB_VERSION, dependencies are upgraded if necessary).

However, based on this suggestion:

maybe just a few more options (if wanted) to provide to the installer could be a way?

Here's an alternative: add a --no-user-environment flag or similar:

consideRatio commented 1 year ago

[...] and only upgrade packages if they're no longer compatible. That's still tricky to get right though, [...]

The more I've thought about this, I'm now grown opinionated towards not upgrading things like ipywidgets or trying to figure out if things are compatible or not - because we just can't say much. We could say that the packages we have managed are internally compatible, but ipywidgets for example is a dependency for all kinds of things that the user may have installed - so there is simply no way we can upgrade this safely.

If we would touch the user env more than needed, and that could break something for a user, I would feel responsible for it as a maintainer if a issue report comes in. I really want to avoid that! I believe its essential for the maintenance of this repo, and personally for my motivation to maintain it, that we draw a clear line so that that we don't get responsible for issues in the user env.

With this said, let's separate two questions for now:

  1. what do we do in the user env during initial setup?
  2. what do we do in the user env during upgrades of tljh?

I think the second question must be answered to unblock a release.

My opinion about the second is to do as little as possible during an upgrade in the user env as we just can't know whats breaking and not. In practice, I figure it would mean to ensure a jupyterhub version compatible with the hub environment (note there are: system env, hub env, and user env).

@manics and @yuvipanda what do you think about my suggestion of not touching anything more than needed in the user env during upgrades of a TLJH installation?

consideRatio commented 1 year ago

Current situation

We and pip install --upgrade -r requirements-base.txt during initial TLJH installs and but also during TLJH upgrades.

https://github.com/jupyterhub/the-littlest-jupyterhub/blob/cca199ea6878f8e47104b82b2c854b3b46cb1990/tljh/installer.py#L268-L272

https://github.com/jupyterhub/the-littlest-jupyterhub/blob/cca199ea6878f8e47104b82b2c854b3b46cb1990/tljh/requirements-base.txt#L1-L18

minrk commented 1 year ago

I like @consideRatio's proposal about reducing what we do on upgrade, but not changing what we install to start or by default.

I think this relates to https://github.com/jupyterhub/the-littlest-jupyterhub/pull/858

I think tljh is mostly designed as a starting point, and the tljh installer is not generally expected to be run again after that. I know it works now, but I think we should probably think more carefully about what the installer does as an 'upgrade' step vs. initial install.

We have at least these bare minumum requirements:

hub env:

user env (bare minimum):

But also a minimum set to be 'basically functional' to start, but for which there are alternatives and we have to pick one (or more):

and we have these packages which have been selected for tljh specifically because they reduce maintenance burden of a tljh instance for the target users:

Note that tljh is not THE way to install jupyterhub on a single server. If you're a sysadmin, setting up jupyterhub isn't a huge task, and you can make all the choices you want. You can even use tljh purely as a convenient starting point, so any choices it makes to start are safely ignored/reverted after the fact. It's specifically for the use cases @yuvipanda described in the blog linked above.

With all that in mind, I think it makes sense to follow @consideRatio's proposal for tljh "upgrade" to only affect actual compatibility problems, and never "reset" the user env. I see two (orthogonal) possible proposals to address this:

I think the latter proposal doesn't solve much if upgrade only checks for compatibility instead of resetting the env, so I'd suggest we wait on that until after we see a need once upgrade is less disruptive.