pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.53k stars 3.03k forks source link

Lazy import allows wheel to execute code on install. #13079

Open calebbrown opened 14 hours ago

calebbrown commented 14 hours ago

Description

Versions of pip since 24.1b1 allow someone to run arbitrary code after a specially crafted bdist whl file is installed.

When installing wheel files pip does not constrain the directories the wheel contents are written into, except for checks that ensure traversal is only within the destination directories (e.g, purelib, platlib, data, etc) (see #4625)

This means a wheel is able to place files into existing modules that belong to other packages, such as pip, setuptools, etc.

If the installer lazily imports a module after the wheel is installed it is possible for the wheel to overwrite the module with its own code, which is then imported unintentionally by the installer.

For pip, this has been true since 24.1b1 when a change was introduced that dynamically loads the pip._internal.self_outdated_check module after running a command to check if pip needs upgrading.

Because this module is loaded after a package has been installed, a wheel can overwrite {purelib}/pip/_internal/self_outdated_check.py and have the code within it automatically executed when pip install {wheel} is run.

Expected behavior

This behavior is surprising. My understanding is that most Python users expect wheels can't run code during installation.

For example, the recent blog post on command jacking demonstrates this expectation:

Python wheels (.whl files) have become increasingly prevalent due to their performance benefits in package installation. However, they present a unique challenge for attackers

While both .tar.gz and .whl files may contain a setup.py file, .whl files don’t execute setup.py during installation. This characteristic has traditionally made it more difficult for attackers to achieve arbitrary code execution during the installation process when using .whl files.

That said, the wheel spec says nothing about security, or avoiding on-install code execution.

pip version

24.1b1

Python version

v3.11 later

OS

any

How to Reproduce

  1. python3 -m venv env
  2. . env/bin/activate
  3. pip install --upgrade pip
  4. pip install wheelofdespair

Output

Collecting wheelofdespair
  Downloading wheelofdespair-0.0.1-py3-none-any.whl.metadata (201 bytes)
Downloading wheelofdespair-0.0.1-py3-none-any.whl (1.5 kB)
Installing collected packages: wheelofdespair
Successfully installed wheelofdespair-0.0.1
PoC: Wheel-of-Despair code execution.

Code of Conduct

calebbrown commented 14 hours ago

Just a couple more additions:

ichard26 commented 14 hours ago

I'll note that there are other ways to compromise pip. A malicious wheel could replace a key file used by pip, which is then picked up on the next invocation. Or they could replace the pip script on PATH. Etc.

But yeah, this does make it easier to achieve arbitrary code execution as it only requires one invocation. We already eagerly import the self-check module when upgrading pip (to avoid crashes). It would be reasonable to always import the module eagerly in the install command module. https://github.com/pypa/pip/blob/fe0925b3c00bf8956a0d33408df692ac364217d4/src/pip/_internal/commands/install.py#L411-L416

Feel free to send a PR. Thanks for investigating and letting us know!

P.S. I haven't looked at this in detail, but I suspect there are other lazy imports in the codebase. Not sure if they're suspectible to ACE or not.

calebbrown commented 13 hours ago

Thanks @ichard26 for the quick triage.

Looking at strace during pip install, the only other import I can see is pip._internal.utils.entrypoints but that appears to be imported through pip._internal.self_outdated_check.

I'll create a PR for this, but would you still like to keep the lazy loading except for install (i.e. remove the if modifying_pip condition but keep the import where it is), or would you prefer to make it non-lazy globally and import at the top of pip._internal.cli.index_command?

ichard26 commented 12 hours ago

The import was made lazy in order to avoid importing the entire network and index (HTML) parsing stack. This improves start-up time for the commands that don't need these components. For example, pip list is an index command, but usually does not access the network at all and thus should not perform a self-check or import the machinery needed for the self-check. The tricky part is that a command like pip list --outdated does require the network and can perform a self-check. This makes an eager import at the top of cli.index_command unacceptable.

(i.e. remove the if modifying_pip condition but keep the import where it is)

It'd probably be more robust to simply import the self-check at the top of commands.install.

di commented 3 hours ago

Would definitely be great to fix this if possible, but I'm curious about setting a precedent here: is this behavior pip would be willing to guarantee even if the wheel spec does not specifically address it? Or is this only a best-effort fix?

If the goal is to guarantee the behavior, maybe @calebbrown you would be willing to help write a test here that would prevent a future regression, and this could be documented as well?

pfmoore commented 1 hour ago

I don't think we'd want to guarantee this.

The fact that a wheel can install files for an arbitrary import package is a feature, not a bug[^1] - pillow installs PIL, setuptools installs pkg_resources, etc. The fact that pip allows a wheel to install files that overwrite those of an existing package is a known issue, and https://github.com/pypa/pip/issues/4625 is tracking this. As you'll notice if you read that issue, it's not a trivial problem to fix. The fact that "lazy" imports[^2] are affected if you alter the contents of sys.path while the program is running is a feature of Python's import system.

So while I'd be fine with a change that removes this specific issue, and as a result reduces the risk of problems, I don't think it's something we should try to guarantee. Users need to understand that when they install a wheel, it can affect the behaviour of both programs they subsequently run, and currently running programs. That isn't just pip - to give another example, if you have a service running from a Python environment and you install something new in that environment, the service can be affected. Ultimately, it is the user's responsibility to ensure that they only install trusted packages.

If someone wanted to write a section for the packaging user guide covering the trust and threat models for Python packaging, I'm sure that would be extremely useful.

[^1]: Although it's a feature that's open to abuse, and we could consider changing it, if anyone had the stomach for addressing the backward compatibility issues. [^2]: They aren't technically "lazy", they just aren't done at program startup.

di commented 45 minutes ago

At the risk of getting quoted if/when this gets used by a bad actor: I would argue that we shouldn't fix things we don't plan to keep fixed. If this is just a subclass of #4625 and would be resolved there, seems like this would be considered a duplicate of that issue, even if it's a novel path to reproduce it.