pex-tool / pex

A tool for generating .pex (Python EXecutable) files, lock files and venvs.
https://docs.pex-tool.org/
Apache License 2.0
2.53k stars 259 forks source link

Regression of artifact URL due to change since pip 23.3 #2414

Closed tachikoma-li closed 3 months ago

tachikoma-li commented 4 months ago

Looks like there is a "breaking" change from pip in this commit, which applies the build_tracker conditionally. The result is that there will be no log info from the build tracker module, including the item like this that is used in pex. It seems as a result of this change, the URL for artifacts from private python package index hit the same issue as raised in https://github.com/pex-tool/pex/issues/2156.

It affects pip >= 23.3. Lower versions seem fine with pex generating proper URLs for those artifacts.

As an example, this is the log from pip 23.2.1 for installing requests:

image

and this is from pip 23.3:

image
jsirois commented 4 months ago

@tachikoma-li most useful would be log text (instead of screen shots) and an example of the lock file generated with both versions of Pip. Even better, just the full pex command line (and input requirements files, if any). I'm not exactly sure what problem you are encountering. The #2156 issue was moot, since the querystring that user needed was dynamic and changed over time (an auth token with an expiry). Pex currently absolutely does not support dynamic URLs. Whatever artifact URL is written down in the lockfile when it is created must be good forever after or the lockfile must be regenerated. If your lockfile artifact URLs are missing static querystring data they need, that's different and we can use this issue to track a fix. I look forward to hearing more details about your case.

tachikoma-li commented 4 months ago

Hi @jsirois, we have some internal Python packages (in both wheel and sdist format) host in the private PyPI. When we try to generate the lockfile with pex, we have some issues with the URLs for those internal packages. In particular, Pex uses the impermanent URLs for those packages which causes the installation to fail later, and the issue only happens with pip >= 23.3.1. With pip 23.2 (or any lower version), the lockfile can pick up the URL pointing to the private PyPI (instead of the impermanent URLs that have the token for authentication), which allows us to use .netrc to access the PyPI for downloading. For example, this would generate the expected lock

pex3 lock create --use-pip-config <internal_pkg> --pip-version 23.2

but the following fails

pex3 lock create --use-pip-config <internal_pkg> --pip-version 23.3.1

From what I understand, pants needs static URLs for downloading packages in parallel when resolving dependencies. But the behavior seems to be inconsistent with different configurations:

It seems Pex relies on log from pip to get the information of the artifacts. One thing we notice is this change from pip can affect the log. For example, all logs related to build tracker are only available for sdist (which is part of the release of pip 23.3), but not for wheel, which seems to explain the inconsistence mentioned above.

Just wondering, is this the intended behavior, is there any plan to fix the issue for wheel (or the make the behavior for sdist similar to wheel?

jsirois commented 4 months ago

@tachikoma-li the Pip change you identify may or may not have to do with the issue you see. That said, it should only affect sdists (and VCS URLs and local project paths) and not wheels - exactly the opposite of the problem you're having if I understand correctly.

Just wondering, is this the intended behavior, is there any plan to fix the issue for wheel (or the make the behavior for sdist similar to wheel?

The Pex integration tests are extensive and include locks of wheels, sdists, VCS URLs and even local projects across the full range of supported Pip versions. As such, you're hitting a corner here that almost certainly involves the specifics of the private PyPI setup you use. Note that Pex CI itself uses a "private" devpi mirror instead of PyPI directly - although there is no auth used.

More to the point, I really need to be able to reproduce your issue to solve it. Can you please provide all of the details from either the "Indirect" or "Direct" lists below?

Indirect:

Direct:

tachikoma-li commented 4 months ago

@jsirois Unfortunately, we can't provide the full pip log for now. Our internal server will redirect to AWS to fetch the packages when recieving download requests, which seems to have the same behavior as devpi. We tried devpi today and did find something similar to our case, which hopefully can help clarify the issue. Steps to reproduce:

This is very close to our issue where the redirected URL is impermanent with auth tokens and causing trouble, while the original URL would work. And this would only affect pip version >= 23.3.1 (excluding the sdist case).

jsirois commented 4 months ago

@tachikoma-liI thanks, I think your last post starts to help reveal the issue here - its use of ~/.pip/pip.conf / --use-pip-config. The --use-pip-config flag prevents Pex from running Pip with --isolated, so Pip can read its ~/.pip/pip.conf config, but, importantly, Pex does not read ~/.pip/pip.conf. So when you tell Pip (and not Pex), that the index URL is https://m.devpi.net/root/pypi/+simple/, Pex still uses the default index URL of PyPI. As things stand, you need to duplicate that piece of config information so Pex knows. Like so:

pex3 lock create --use-pip-config --no-pypi --index https://m.devpi.net/root/pypi/+simple/ ...

This is a bug, but a long standing one (you can also specify extra indexes in a requirements.txt file and Pip will use those but Pex will not known about them - same bug). The idea of introducing --use-pip-config (which was introduce in September 2023 in v2.1.147) was to enable things like Pip keyring support.

@tachikoma-li now that you know you must tell Pex directly about the indexes to use, does doing so fix your problem?

jsirois commented 4 months ago

To explain what I think the issue really is here:

You're using Pants to run pex and Pants does this:

As such Pants ends up launching Pex like so:

pex3 lock create --no-pypi --index https://pypi.org/simple/ --use-pip-config ...

Which mean Pex ends up launching Pip like so:

pip download --index-url https://pypi.org/simple/ ...

So now Pip is configured with 2 --index-urls - the one from the command line and the other from ~/.pip/pip.conf. When faced with multiple indexes, Pip is notorious for not defining a rule for the precedence they will be used in and that precedence has, in fact, shifted over time (see https://github.com/pypa/pip/issues/9484#issuecomment-764470437 for example). This shifting of precedence over Pip releases (and not the logging patch you suspected) likely explains why things worked for you under one Pip version and not another.

tachikoma-li commented 4 months ago

Hi @jsirois, thanks for the quick reply! I just did some tests locally with a single index, but still seem to hit the same issue:

jsirois commented 4 months ago

@tachikoma-li --index is not single index, it adds an index to the default (PyPI). Use --no-pypi --index https://m.devpi.net/root/pypi/+simple/.

Perhaps this is not clear from the --help:

pex --help
...
 --pypi, --no-pypi, --no-index
                        Whether to use PyPI to resolve dependencies. (default:
                        True)
  -f PATH/URL, --find-links PATH/URL, --repo PATH/URL
                        Additional repository path (directory or URL) to look
                        for requirements. (default: None)
  -i URL, --index URL, --index-url URL
                        Additional cheeseshop indices to use to satisfy
                        requirements. (default: None)
...

To see what is passed to Pip you can run Pex with -vvv:

:; pex --index https://m.devpi.net/root/pypi/+simple/ -vvv cowsay -c cowsay -- -t Moo!
pex: Searching for pyenv root...: 0.0ms
pex: Building pex :: Resolving distributions for requirements: cowsay :: Resolving requirements. :: Resolving for:
pex: Hashing pex: 16.2ms
pex: Isolating pex: 0.0ms
pex: Executing: SHELL=/bin/bash PYENV_SHELL=bash NVM_INC=/home/jsirois/.nvm/versions/node/v20.11.1/include/node WSL2_GUI_APPS_ENABLED=1 WSL_DISTRO_NAME=Ubuntu-22.04 WT_SESSION=c5fa7451-c22b-458b-ae43-44059ac829e1 SSH_AUTH_SOCK=/tmp/ssh-XXXXXX8BU6xC/agent.575 SSH_AGENT_PID=579 GPG_TTY=/dev/pts/5 NAME=Gill-Windows PWD=/home/jsirois LOGNAME=jsirois HOME=/home/jsirois LANG=C.UTF-8 LS_COLORS='rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:' STARSHIP_SHELL=bash WAYLAND_DISPLAY=wayland-0 STARSHIP_SESSION_KEY=1098223452111492 NVM_DIR=/home/jsirois/.nvm LESSCLOSE='/usr/bin/lesspipe %s %s' TERM=xterm-256color LESSOPEN='| /usr/bin/lesspipe %s' USER=jsirois DISPLAY=:0 SHLVL=1 NVM_CD_FLAGS='' XDG_RUNTIME_DIR=/run/user/1000/ PYENV_ROOT=/home/jsirois/.pyenv WSLENV=WT_SESSION:WT_PROFILE_ID: XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop PATH=/home/jsirois/.local/bin:/home/jsirois/bin:/home/jsirois/.nvm/versions/node/v20.11.1/bin:/home/jsirois/bin:/home/jsirois/.cargo/bin:/home/jsirois/.pyenv/shims:/home/jsirois/.pyenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/snap/bin DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus NVM_BIN=/home/jsirois/.nvm/versions/node/v20.11.1/bin HOSTTYPE=x86_64 PULSE_SERVER=unix:/mnt/wslg/PulseServer WT_PROFILE_ID='{e5a83caa-4c73-52b3-ae6b-bc438d721ef9}' _=/home/jsirois/bin/pex TMPDIR=/home/jsirois/.pex/pip/23.2/pip_cache/.tmp PEX_ROOT=/home/jsirois/.pex PEX_VERBOSE=3 __PEX_UNVENDORED__=1 /home/jsirois/.pex/venvs/88f819b2277ee3d08ec0884bf56b5d4b0ca451b9/0de1795ad4486f45ee94cecd983e9905b6b11dc9/bin/python -sE /home/jsirois/.pex/venvs/88f819b2277ee3d08ec0884bf56b5d4b0ca451b9/0de1795ad4486f45ee94cecd983e9905b6b11dc9/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -v --cache-dir /home/jsirois/.pex/pip/23.2/pip_cache --log /tmp/pex-pip-log.nomn3piw/pip.log download --dest /home/jsirois/.pex/downloads/resolver_download.25q2kwsg/home.jsirois.bin.pex.venv.bin.python3.12 cowsay --index-url https://pypi.org/simple --extra-index-url https://m.devpi.net/root/pypi/+simple/ --retries 5 --timeout 15
...

N.B. that pip subprocess is run with --index-url https://pypi.org/simple --extra-index-url https://m.devpi.net/root/pypi/+simple/ - 2 indexes.

tachikoma-li commented 4 months ago

@jsirois, thanks. Just tried with --no-pypi --index, but still got the same result as reported above. That is, pex3 lock create --no-pypi --index https://m.devpi.net/root/pypi/+simple/ wheel --pip-version 23.3.1 --only-build wheel outputs the right URL, but pex3 lock create --no-pypi --index https://m.devpi.net/root/pypi/+simple/ wheel --pip-version 23.3.1 outputs the redirected one.

jsirois commented 4 months ago

@tachikoma-li Ok, great - thanks. I repro that at the 23.2 / 23.3.1 boundary. This is enough to work with now.

jsirois commented 3 months ago

Ok. It turns out the issue was in fact related to the build tracker logging change, but that just masked the real issue. Logging for re-directs has always looked like so (this is a Pip 23.2 example):

...
2024-06-07T08:00:38,033   Found link https://m.devpi.net/root/pypi/%2Bf/55c/570405f142630/wheel-0.43.0-py3-none-any.whl (from https://m.devpi.net/root/pypi/+simple/wheel/) (requires-python:>=3.8), version: 0.43.0
2024-06-07T08:00:38,033   Found link https://m.devpi.net/root/pypi/%2Bf/465/ef92c69fa5c5d/wheel-0.43.0.tar.gz (from https://m.devpi.net/root/pypi/+simple/wheel/) (requires-python:>=3.8), version: 0.43.0
2024-06-07T08:00:38,033 Skipping link: not a file: https://m.devpi.net/root/pypi/+simple/wheel/
2024-06-07T08:00:38,034 Given no hashes to check 132 links for project 'wheel': discarding no candidates
2024-06-07T08:00:38,036 Collecting wheel
2024-06-07T08:00:38,037   Created temporary directory: /home/jsirois/.pex/pip/23.2/pip_cache/.tmp/pip-unpack-jio7spwe
2024-06-07T08:00:38,037   Found index url https://m.devpi.net/root/pypi/+simple/
2024-06-07T08:00:38,038   Looking up "https://m.devpi.net/root/pypi/%2Bf/55c/570405f142630/wheel-0.43.0-py3-none-any.whl" in the cache
2024-06-07T08:00:38,038   No cache entry available
2024-06-07T08:00:38,228   https://m.devpi.net:443 "GET /root/pypi/%2Bf/55c/570405f142630/wheel-0.43.0-py3-none-any.whl HTTP/1.1" 302 304
2024-06-07T08:00:38,230   Status code 302 not in (200, 203, 300, 301, 308)
2024-06-07T08:00:38,233   Looking up "https://files.pythonhosted.org/packages/7d/cd/d7460c9a869b16c3dd4e1e403cce337df165368c71d6af229a74699622ce/wheel-0.43.0-py3-none-any.whl" in the cache
2024-06-07T08:00:38,233   No cache entry available
2024-06-07T08:00:38,233   Starting new HTTPS connection (1): files.pythonhosted.org:443
2024-06-07T08:00:38,454   https://files.pythonhosted.org:443 "GET /packages/7d/cd/d7460c9a869b16c3dd4e1e403cce337df165368c71d6af229a74699622ce/wheel-0.43.0-py3-none-any.whl HTTP/1.1" 200 65775
2024-06-07T08:00:38,455   Downloading https://m.devpi.net/root/pypi/%2Bf/55c/570405f142630/wheel-0.43.0-py3-none-any.whl (65 kB)
2024-06-07T08:00:38,543   Ignoring unknown cache-control directive: immutable
2024-06-07T08:00:38,543   Updating cache with response from "https://files.pythonhosted.org/packages/7d/cd/d7460c9a869b16c3dd4e1e403cce337df165368c71d6af229a74699622ce/wheel-0.43.0-py3-none-any.whl"
2024-06-07T08:00:38,543   etag object cached for 1209600 seconds
2024-06-07T08:00:38,543   Caching due to etag
2024-06-07T08:00:38,555   Added wheel from https://m.devpi.net/root/pypi/%2Bf/55c/570405f142630/wheel-0.43.0-py3-none-any.whl to build tracker '/home/jsirois/.pex/pip/23.2/pip_cache/.tmp/pip-build-tracker-b86f1klr'
2024-06-07T08:00:38,556   Removed wheel from https://m.devpi.net/root/pypi/%2Bf/55c/570405f142630/wheel-0.43.0-py3-none-any.whl from build tracker '/home/jsirois/.pex/pip/23.2/pip_cache/.tmp/pip-build-tracker-b86f1klr'
...

The lines tracked by the Pex Locker are - in order from the log snip above:

  1. 2024-06-07T08:00:38,233 Looking up "https://files.pythonhosted.org/packages/7d/cd/d7460c9a869b16c3dd4e1e403cce337df165368c71d6af229a74699622ce/wheel-0.43.0-py3-none-any.whl" in the cache
  2. 2024-06-07T08:00:38,233 Looking up "https://files.pythonhosted.org/packages/7d/cd/d7460c9a869b16c3dd4e1e403cce337df165368c71d6af229a74699622ce/wheel-0.43.0-py3-none-any.whl" in the cache
  3. 2024-06-07T08:00:38,555 Added wheel from https://m.devpi.net/root/pypi/%2Bf/55c/570405f142630/wheel-0.43.0-py3-none-any.whl to build tracker '/home/jsirois/.pex/pip/23.2/pip_cache/.tmp/pip-build-tracker-b86f1klr'
  4. 2024-06-07T08:00:38,556 Removed wheel from https://m.devpi.net/root/pypi/%2Bf/55c/570405f142630/wheel-0.43.0-py3-none-any.whl from build tracker '/home/jsirois/.pex/pip/23.2/pip_cache/.tmp/pip-build-tracker-b86f1klr'

Lines 3 and 4 exist for Pip 23.2 and older but do not for Pip 23.3 and newer. Line 3 in particular is the log line that masked a bug introduced in #2276 in the handling of lines 1 and 2. Lines 1, 2 and 3 all funnel to here: https://github.com/pex-tool/pex/blob/27c2db2bf26039bef41323c964bc4e0317a7b4f5/pex/resolve/locker.py#L287-L301

On line 295 the ResolvedRequirement for the given Pin (project name + version; i.e.: wheel 0.43.0) blindly over-writes any prior ResolvedRequirement for that Pin. In short, the last write wins. For Pip 23.2 and older, this last write comes from line 3, which contains the original URL. For Pip 23.3 and newer, the last write comes from line 2, the re-directed URL.

So it seems the fix is to only record the 1st ResolvedRequirement which will always represent the original index URL.

jsirois commented 3 months ago

The fix is released in 2.3.3: https://github.com/pex-tool/pex/releases/tag/v2.3.3