python / cpython

The Python programming language
https://www.python.org
Other
63.14k stars 30.23k forks source link

venv base path does not resolve symlinks using realpath() #106045

Open nascheme opened 1 year ago

nascheme commented 1 year ago

It seems there is a bug in venv (and a similar one in virtualenv) where the "base" path in pyvenv.cfg is set incorrectly. If Python is installed in a non-standard folder, e.g. /usr/local/python-X.Y.Z and then symlinked into /usr/local/bin/python3, the venv package does not work correctly. I believe the source of the trouble is the setting of "home" variable. Specifically, this line:

dirname, exename = os.path.split(os.path.abspath(executable))

The dirname result is used to set "home". If the executable path is /usr/local/bin/python3 (actually a symlink to /usr/local/python-X.Y.Z/bin/python3), the "home" should not be set to /usr/local. Changing the above line (in the ensure_directories()) function to:

dirname, exename = os.path.split(os.path.realpath(executable))

This fixes the problem. I believe this is consistent with what the getpath.py module in Python does.

I noticed with problem when running the most recent Debian OS, version 12. It includes Python 3.11 and therefore /usr/lib/python3.11 exists. With the above bug, the venv tries to use /usr/lib/python3.11 as the sys.path. Importing the struct module fails with the mysterious error:

>>> import struct
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/struct.py", line 13, in <module>
    from _struct import *
ModuleNotFoundError: No module named '_struct'

Linked PRs

mayeut commented 9 months ago

I've run into this as well: Python built from source & installed in /opt/python-3.11, symlink in /usr/local/bin/python3.11.

In my case, it was on ubuntu 22.04, the venv could not be created with a failure to import subprocess because indeed it was trying to use /usr/lib/python3.11 for lib-dynload in sys.path. Changing home manually or patching the venv module worked.

reproducer in GHA: https://github.com/mayeut/sandbox/actions/runs/7518304095/job/20465487706#step:3:81

cjolowicz commented 7 months ago

Note that resolving symlinks will break Homebrew, see https://github.com/astral-sh/uv/issues/1640

konstin commented 7 months ago

I noticed with problem when running the most recent Debian OS, version 12. It includes Python 3.11 and therefore /usr/lib/python3.11 exists. With the above bug, the venv tries to use /usr/lib/python3.11 as the sys.path. Importing the struct module fails with the mysterious error:

Do you have instructions how to reproduce this? We'd like to avoid this problem in uv.

From docker run --rm debian:12

$ apt update
$ apt install -y python3-venv
$ which python3
/usr/bin/python3
$ python3 --version
Python 3.11.2
$ ln -s /usr/bin/python3 /usr/local/bin/python3

Then venv/bin/python -c "import struct; import subprocess" works fine for me.

Python built from source & installed in /opt/python-3.11, symlink in /usr/local/bin/python3.11.

Same questions, do you have instructions how to reproduce this (outside the github actions one)? I tried this with a python checkout (4d3ee77aef7c3f739b3f8d4dc46dd946c2a80627):

./configure --prefix=/opt/cpython1 --enable-optimizations
make -s -j32
sudo make install

Then i tried:

mkdir a
cd a
ln -s /opt/cpython1/bin/python3 python3
cd ..
mkdir b
cd b
../a/python3 -m venv venv
venv/bin/python -c "import struct; import subprocess"
cd ..
mkdir c
cd c
virtualenv -p ../a/python3 venv
venv/bin/python -c "import struct; import subprocess"

But this all passed.

mayeut commented 7 months ago

@konstin,

On ubuntu 22.04, install sudo apt-get install -y python3-distutils, create a symlink to your python3.11 build in /usr/local/bin/python3.11.

I think the fact that the symlink is in /usr/local/bin folder matters. Just creating a symlink elsewhere does not necessarily reveals the issue.

konstin commented 7 months ago

Do you have more information about where the python3.11 came from? I still can't reproduce

FROM ubuntu:22.04
RUN apt update
RUN apt-get install -y python3-distutils
ENV DEBIAN_FRONTEND=noninteractive
RUN apt install -yy git build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev curl libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
WORKDIR /root
RUN git clone https://github.com/python/cpython/
WORKDIR /root/cpython
RUN git checkout v3.11.8
RUN ./configure --prefix=/opt/cpython1
RUN make -s -j32
RUN make install
WORKDIR /root
RUN /opt/cpython1/bin/python3 --version
RUN ln -s /opt/cpython1/bin/python3 /usr/local/bin/python3
RUN /usr/local/bin/python3 --version
RUN /usr/local/bin/python3 -c "import struct; import subprocess"
RUN /usr/local/bin/python3 -m venv venv1
RUN venv1/bin/python -c "import struct; import subprocess"
RUN venv1/bin/python -m venv venv2
RUN venv2/bin/python -c "import struct; import subprocess"
mayeut commented 7 months ago

I could reproduce it using your dockerfile with a slight modification. The folder /usr/lib/python3.11/lib-dynload should exist for the venv creation to fail. You can mkdir the folder or also add python3-gdbm to the list of packages to install.

konstin commented 7 months ago

To summarize, the error only happens (editing this to 3.13 to be actionable on main) when you have a custom built python that is linked to exactly /usr/local/bin/python3, but there's also a system python of the same version installed in /usr installed that created /usr/lib/python3.13/lib-dynload?

I've tried to trace the cause and it seems that installed_platbase gets switched to /usr on venv creation:

$ /opt/cpython1/bin/python3 -m sysconfig | grep -i installed_platbase
        installed_platbase = "/opt/cpython1"
$ venv1/bin/python -m sysconfig | grep -i installed_platbase
        installed_platbase = "/usr"

This in turn changes platinclude to the wrong python:

$ /opt/cpython1/bin/python3 -m sysconfig | head -n 14
Platform: "linux-x86_64"
Python version: "3.13"
Current installation scheme: "posix_prefix"

Paths: 
        data = "/opt/cpython1"
        include = "/opt/cpython1/include/python3.13"
        platinclude = "/opt/cpython1/include/python3.13"
        platlib = "/opt/cpython1/lib/python3.13/site-packages"
        platstdlib = "/opt/cpython1/lib/python3.13"
        purelib = "/opt/cpython1/lib/python3.13/site-packages"
        scripts = "/opt/cpython1/bin"
        stdlib = "/opt/cpython1/lib/python3.13"

$ venv1/bin/python3 -m sysconfig | head -n 14
Platform: "linux-x86_64"
Python version: "3.13"
Current installation scheme: "venv"

Paths: 
        data = "/root/venv1"
        include = "/opt/cpython1/include/python3.13"
        platinclude = "/usr/include/python3.13"
        platlib = "/root/venv1/lib/python3.13/site-packages"
        platstdlib = "/root/venv1/lib/python3.13"
        purelib = "/root/venv1/lib/python3.13/site-packages"
        scripts = "/root/venv1/bin"
        stdlib = "/opt/cpython1/lib/python3.13"

I unfortunately don't understand enough about the python sysconfig machinery to understand how/why installed_platbase is set.

Note that for the system (ubuntu apt install python3.11 python3.11-venv) python, all paths are (correctly) /usr based and stay this way:

$ /usr/bin/python3.11 -m sysconfig | head -n 14
Platform: "linux-x86_64"
Python version: "3.11"
Current installation scheme: "posix_local"

Paths: 
        data = "/usr/local"
        include = "/usr/include/python3.11"
        platinclude = "/usr/include/python3.11"
        platlib = "/usr/local/lib/python3.11/dist-packages"
        platstdlib = "/usr/lib/python3.11"
        purelib = "/usr/local/lib/python3.11/dist-packages"
        scripts = "/usr/local/bin"
        stdlib = "/usr/lib/python3.11"
$ venv-sys/bin/python -m sysconfig | head -n 14
Platform: "linux-x86_64"
Python version: "3.11"
Current installation scheme: "venv"

Paths: 
        data = "/root/venv-sys"
        include = "/usr/include/python3.11"
        platinclude = "/usr/include/python3.11"
        platlib = "/root/venv-sys/lib/python3.11/site-packages"
        platstdlib = "/root/venv-sys/lib/python3.11"
        purelib = "/root/venv-sys/lib/python3.11/site-packages"
        scripts = "/root/venv-sys/bin"
        stdlib = "/usr/lib/python3.11"
cjolowicz commented 7 months ago

I haven't looked into this in detail, but might the real issue here be that you're effectively running the interpreter in a broken system-wide environment under /usr/local? In other words, don't just symlink the interpreter. (Disclaimer: I'm not an expert in the sysconfig machinery either.)

~The actual command used to launch the interpreter determines what's considered the environment, even if it's a symbolic link into a valid installation. To my knowledge, sysconfig uses the nominal (unresolved) interpreter path to expand the location templates in installation schemes.~ (not sure about that, should double check)

nascheme commented 7 months ago

I haven't looked into this in detail, but might the real issue here be that you're effectively running the interpreter in a broken system-wide environment under /usr/local?

I don't know what a "broken system-wide environment" means. As I said in the description, I think this is a bug in venv since it doesn't match what getpath.c does.

Rather than having venv have complicated logic to figure out what "home" should be, perhaps it should just run something like: python3 -c "import sysconfig; print(sysconfig.get_paths())".

I've worked around the issue by creating wrapper shell scripts for executables, rather than using symlinks, e.g. usr/local/bin/python3 contains:

#!/bin/sh
exec /usr/local/python-3.12.2/bin/python3.12 "$@"
mayeut commented 3 weeks ago

In order not to break usage involving symlinks to a python install tree, rather than resolving the realpath, only executable symlinks can be checked. xref https://github.com/pypa/virtualenv/issues/2770#issuecomment-2376338499