jupyter / jupyter_core

Core Jupyter functionality
https://jupyter-core.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
196 stars 180 forks source link

Make JUPYTER_PREFER_ENV_PATH=1 default #283

Closed gutow closed 2 years ago

gutow commented 2 years ago

I propose that JUPYTER_PREFER_ENV_PATH=1 be made the default behavior.

I cannot imagine a case where someone would set up a virtual environment and not expect the python based software such as Jupyter lab to use what is installed in the virtual environment first. Thus, I believe Jupyter lab should never use versions of extensions, etc outside of the virtual environment in preference to those inside. I can imagine people wanting to set up some utilities that work in all environments. So, can see there might be issues surrounding that. I suggest the following logic:

  1. If a package is installed in the virtual environment use that in preference to any other available versions.
  2. If a package does not exist in the virtual environment but is available at the user level use it in preference to versions available at the system level.
  3. Only use versions available at the system level if they have not been overridden at the previous two levels.
  4. There may need to be a switch to not use packages from the user or system level.

I encountered an unexpected issue with ipywidgets because of this (see https://github.com/jupyter-widgets/ipywidgets/issues/3559).

The original implementation was discussed in https://github.com/jupyter/jupyter_core/pull/199

Thanks for a great tool.

Jonathan

jasongrout commented 2 years ago

The main issues I saw with making JUPYTER_PREFER_ENV_PATH set by default are:

  1. it is a major breaking change for backwards compatibility
  2. I didn't see a way to reliably tell if sys.prefix was from a virtual environment or a system installation - that's why we opted for an explicit user setting.

Does anyone know of a way to reliably tell if sys.prefix comes from a virtual environment or not?

On the other points:

Only use versions available at the system level if they have not been overridden at the previous two levels.

That already happens in JupyterLab - extensions in earlier directories in the path override extensions in later directories.

There may need to be a switch to not use packages from the user or system level.

I think you can set JUPYTER_CONFIG_DIR and JUPYTER_DATA_DIR to override the user-level directories - possibly setting them to empty effectively turns off the user level. I don't know of a way to turn off the system level (and not sure there is a real-world use case in practice).

gutow commented 2 years ago

2. I didn't see a way to reliably tell if sys.prefix was from a virtual environment or a system installation - that's why we opted for an explicit user setting.

I'm confused by this. Does this mean you are overriding sys.prefix? If you are operating in a virtual environment sys.prefix should point to the virtual environment. Is there a way that it might not (other than the user specifically setting it otherwise). Thus, I believe the sys.prefix should take precedence.

From the python documentation:

sys.prefix

A string giving the site-specific directory prefix where the platform independent Python files are installed; on Unix, the default is '/usr/local'. This can be set at build time with the --prefix argument to the configure script. See Installation paths for derived paths.

Note

If a virtual environment is in effect, this value will be changed in site.py to point to the virtual environment. The value for the Python installation will still be available, via base_prefix.

jasongrout commented 2 years ago

Does this mean you are overriding sys.prefix?

No, what I mean is that sometimes sys.prefix is intended to be more specific than the user-level directory (for example, if a single person is using multiple virtual environments), and other times sys.prefix is intended to be less specific than the user-level directory (for example, if you are not using a virtual environment, sys.prefix points to a system location like /usr/local shared by many users). I couldn't find a reliable way to tell if the user wants sys.prefix to be more or less specific than user-level directories.

One heuristic is to check if sys.prefix and the user-level directory share a common prefix, i.e., see if the sys.prefix points to a directory inside the user's home directory. That heuristic assumes the path location indicates the precedence. I don't know if that heuristic would be reliable enough to make default. For example, a user might change the user-level path to something outside their home directory, or might be using virtual environments based out of a directory outside the home directory.

jasongrout commented 2 years ago

From the docs you quoted, another heuristic might be to examine sys.base_prefix and sys.prefix. If they are different, assume we are running in a virtual environment and make sys.prefix more specific than user-level directories. This has the following issues:

  1. This assumes that running in a virtual environment means the user intends the virtual environment to be more specific than the user-level config. This is probably a fair assumption and is mostly true, but it is probably not always true. So I think there needs to be an opt-out (which could be setting JUPYTER_PREFER_ENV_PATH to 0).
  2. I think it doesn't handle the case of other virtual environment solutions like conda/mamba, where I think the sys.prefix and the sys.base_prefix will be the same since the whole python install is inside the virtual environment. I think this doesn't prevent us handling python venv better by default, but it would be nice if we could find a solution that handles both cases.
gutow commented 2 years ago

I guess I am not understanding what the problem is. If someone is not using a virtual environment, sys.prefix should point to the proper directory to look for things in. If they are using a virtual environment it should point to the proper directory. Is the issue how to climb the tree above that looking for things?

  1. I think it doesn't handle the case of other virtual environment solutions like conda/mamba, where I think the sys.prefix and the sys.base_prefix will be the same since the whole python install is inside the virtual environment. I think this doesn't prevent us handling python venv better by default, but it would be nice if we could find a solution that handles both cases.

Even in this case, I do not understand the problem. This probably means I do not understand what the code is doing with the directory tree.

gutow commented 2 years ago

I also note this issue about platform dependent directories https://github.com/jupyter/jupyter_core/issues/234. Does part of the problem surround trying to account for platform dependent differences in the jupyter_core?

blink1073 commented 2 years ago

@gutow, as I understand it, the issue is that on a shared system, the order of specificity is: system, user, virtual/conda env. sys_prefix can be either system or virtual env so we don't know where to prioritize the user setting.

One thing we could do to detect if we are in a virtual/conda env is look for sys.prefix != sys.base_prefix or "CONDA_PREFIX" in os.environ. That would cover the vast majority of cases, and seems reasonable for a default setting that can be overridden. Either way I think we'd have to bump a major version of jupyter_core to make the change of default.

jasongrout commented 2 years ago

sys.prefix != sys.base_prefix or "CONDA_PREFIX" in os.environ

That sounds like a reasonable default for the virtual env solutions we know about. I assume mamba sets the CONDA_PREFIX env variable?

blink1073 commented 2 years ago

Yes, I only use mamba now, and I verified. :smile:

gutow commented 2 years ago

One thing we could do to detect if we are in a virtual/conda env is look for sys.prefix != sys.base_prefix or "CONDA_PREFIX" in os.environ. That would cover the vast majority of cases, and seems reasonable for a default setting that can be overridden. Either way I think we'd have to bump a major version of jupyter_core to make the change of default.

If this works, I think that would provide the behavior most would expect of their virtual environments.

blink1073 commented 2 years ago

And inside a virtual env:

>>> import sys
>>> sys.prefix
'/private/tmp/foo'
>>> sys.base_prefix
'/Users/steve.silvester/miniconda'
jasongrout commented 2 years ago

And inside a virtual env:

That looks like a venv inside a conda virtual env :)

jasongrout commented 2 years ago

"CONDA_PREFIX" in os.environ.

We probably also want to check that sys.prefix starts with CONDA_PREFIX, since we might be in a conda env without a python interpreter (like an R conda env, etc.).

blink1073 commented 2 years ago

Oh, interesting, yeah, that makes sense.

jasongrout commented 2 years ago

I took a preliminary stab at this in https://github.com/jupyter/jupyter_core/pull/286 - anyone feel free to take over it.