jupyter / jupyter_core

Core Jupyter functionality
https://jupyter-core.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
195 stars 181 forks source link

Feature request: do not prefer env path if it is not writeable #318

Closed athornton closed 1 year ago

athornton commented 1 year ago

Jupyter_core 5.0.0 made the env path preferred (unless you set JUPYTER_PREFER_ENV_PATH=no). This is probably the right general case, but it didn't work for us, and here's why.

We are indeed running that lab from inside an environment (specifically rubin-env-rsp from conda-forge)...but the way we have the Rubin Science Platform constructed, that path is not writeable. It's read-only within the container.

I suspect you didn't consider that case; it is after all pretty unusual. I feel like a better default would be to see whether the env path is writeable, and if it is NOT, do not prefer it, but if it is, then prefer it. This would give the advantages of the 5.0 strategy, but also accommodate our use-case and that of sites doing the same thing we are of providing a user-immutable reference platform.

I don't know if I will have the cycles myself to produce a PR that will implement this, but if I do, I will.

jasongrout commented 1 year ago

Thanks for bringing up this usecase, and thanks for pointing out that you can override the default guess at the right behavior with an environment variable.

I think it is really important that the defaults be simple to understand, and complications in the defaults should be beneficial to most users to outweigh the complexity they introduce. I think, as you point out, the current default is probably right in the general case, and we provide an escape hatch for exceptions.

At this point, we don't change default behavior based on writeability of directories in the hierarchy. If we start changing behavior based on ability to write, that introduces some complications and corner cases:

(a) perhaps we should check if the user-level path is also writeable - if it isn't, what should we do? (b) behavior changes depending on what permissions you are running with, e.g., an administrator would have a different path hierarchy than a user, which could be confusing when debugging a user issue

I would love to hear from others that may have the same issue to see how general of a problem this is.

minrk commented 1 year ago

I definitely appreciate the complexity concern. FWIW, I think having JUPYTER_PREVER_ENV_PATH=0 be unconditionally the default is clearly the simplest behavior (simpler than what we have now), and the correct one about as often as not. It's very hard to pick a default for virtualenvs because people use them so differently! Any choice is guaranteed to be wrong for lots of folks. I think virtualenvs are necessarily advanced use cases, so treating them specially being opt-in makes sense to me, too.

I think what needs to be established is expectations. For instance, jupyterlab assumes that jupyter_config_path()[0] is writable, and doesn't work if it isn't. The current dynamic default makes it more likely to be not writable by default. I don't believe that should require user intervention in relatively simple cases like a system-wide conda installation, or jupyterhub. So if jupyter-core explicitly prefers likely-not-writable paths when using envs and won't check, JupyterLab (and any other tools that use shared config locations to persist runtime-specified things like workspaces) needs to implement the writability check themselves. Implementing it further downstream makes it more likely to do just what the package wants, but also makes it more likely that different jupyter components will implement it slightly differently, leading to inconsistency. One possible solution is that jupyter-core provides a writable_config_dir().

FWIW, I think regardless of the env config preference in general, jupyterlab shouldn't be persisting workspaces in envs by default: https://github.com/jupyterlab/jupyterlab/pull/13589 .

Tangentially, I think the detection of conda envs is also currently incorrect because CONDA_PREFIX is defined for even the root env, and the root env should probably not be preferred to standard dirs as a prefix because it's not 'an env' in the isolated-user-created-environment sense. That's fixed by #324.

jasongrout commented 1 year ago

The more I think about it, the more I like your ownership check you proposed in #323. Nominally, the user-level environments are owned by the user and are writable. If the sys.prefix is also owned by the user, that's a strong indication that the sys.prefix should take precedence over the user-level config, especially if we know we are in a virtual environment of some kind.

Perhaps the way to think about it is that JUPYTER_PREFER_ENV_PATH=0 should be the default, unless we have very strong and specific indication we are in a user-created virtual environment, i.e., these conditions would all need to be satisfied:

  1. we know we are in a virtual environment (virtualenv or conda), and
  2. sys.prefix is owned by the user, and
  3. the config directory is writable
minrk commented 1 year ago

I'm 50-50, now. Defaults are hard! I think the ownership check is an interesting one, and probably better than the writability check. Writability falls short in a lot of cases, e.g. if you create a shared env and you want users to be able to install additional packages without sudo - that would result in the wrong conclusion - using a shared env for config when user env is more likely correct.

Ownership check should work (assuming ownership checks work at all! I'm a bit concerned about LDAP and containers and such might get the wrong answer in #323) in any case I can think of except if the user who created and owns the env is one of its users, instead of making the shared env owned by root or another non-user. That will result in one user (successfully) storing their info in the shared env while other users I think that's a bit of an edge case, but it's not impossible (I think I've done it), but I think it's at least a lot narrower than the cases of incorrect conclusions we have right now.

I think there are also 2 things to consider:

For read config priority, I think what we clearly want is to prefer "more specific" configuration, where there's a clear order:

  1. user env
  2. user config
  3. shared env
  4. system

(identifying which is which is the hard part).

For a read-write dir, I think it's much harder to choose, and there are more equally valid choices. Some folks want everything isolated in envs (I believe these are the advanced users) while others want their settings to persist regardless of envs (I think this is simpler and happens to include me). I think only technical point here is that we don't want pick a write location that's behind the read location, which could result in persisting config that we don't load. For cases like jupyterlab's workspaces, a single dir is chosen and the path isn't used for loading, so picking any single writable location is fine on that point. For cases where you are writing config that will later be loaded by path, you want to make sure to write to the front of the path. In that case, picking the front of the path and erroring without write permissions is probably preferable to falling back on a different writable path.

In short, for picking a read-write dir I think jupyter_config_dir() should generally be used for single-dir read/write, and jupyter_config_path()[0] should be used if (and only if) all locations on jupyter_config_path() will be read.

jasongrout commented 1 year ago

What I'm getting out of this conversation is:

  1. Before, we defaulted to JUPYTER_PREFER_ENV_PATH=false
  2. In the latest jupyter_core release, we sometimes default to JUPYTER_PREFER_ENV_PATH=true, but likely we default to true in too many cases.

Since our default was false, and that's what people were used to (i.e., changing the default to true is disruptive), I'd rather revert our overly-broad changes to narrow down the true default to the very specific usecases we had in mind. The list of conditions I proposed above (and your new one about root conda envs) I think assimilate our collective ideas about what should definitely be true in the specific situation we are accommodating). We can always expand back out if we are missing situations we want to accommodate.

minrk commented 1 year ago

That sounds good to me!

I think the one thing in your list I'm not sure about is check number 3: you own the env prefix and config dir is writable. I'm trying to think of cases where you own the prefix and config dir is not writable, and most come from accidents (e.g. an accidental sudo), and perhaps an error attempting to write there is more likely to be the expected behavior (permission error -> oops, I sudo'd and messed up permissions) vs not acting like you're in an env (I made this env, why isn't it using it? Oh, deep in the env a permission is wrong. How should I have figured that out?).

Maybe a read-only filesystem owned by the user, e.g. in a container with a mounted volume containing the env? I'm not sure how realistic that is, or whether it's even the wrong thing to do to still prefer the env in that case for jupyter_config_path(), given the above discussion about the possible distinction between read-write config vs read path priority.

WDYT about stopping at the ownership check for now, and waiting until we see problems with the write permissions?