Lazy import allows wheel to execute code on install.

calebbrown commented 1 week ago

Description

Versions of pip since 24.1b1 allow someone to run arbitrary code after a specially crafted bdist whl file is installed.

When installing wheel files pip does not constrain the directories the wheel contents are written into, except for checks that ensure traversal is only within the destination directories (e.g, purelib, platlib, data, etc) (see #4625)

This means a wheel is able to place files into existing modules that belong to other packages, such as pip, setuptools, etc.

If the installer lazily imports a module after the wheel is installed it is possible for the wheel to overwrite the module with its own code, which is then imported unintentionally by the installer.

For pip, this has been true since 24.1b1 when a change was introduced that dynamically loads the pip._internal.self_outdated_check module after running a command to check if pip needs upgrading.

Because this module is loaded after a package has been installed, a wheel can overwrite {purelib}/pip/_internal/self_outdated_check.py and have the code within it automatically executed when pip install {wheel} is run.

Expected behavior

This behavior is surprising. My understanding is that most Python users expect wheels can't run code during installation.

For example, the recent blog post on command jacking demonstrates this expectation:

Python wheels (.whl files) have become increasingly prevalent due to their performance benefits in package installation. However, they present a unique challenge for attackers

While both .tar.gz and .whl files may contain a setup.py file, .whl files don’t execute setup.py during installation. This characteristic has traditionally made it more difficult for attackers to achieve arbitrary code execution during the installation process when using .whl files.

That said, the wheel spec says nothing about security, or avoiding on-install code execution.

pip version

24.1b1

Python version

v3.11 later

OS

any

How to Reproduce

Download wheelofdespair-0.0.1-py3-none-any.zip
mv wheelofdespair-0.0.1-py3-none-any.zip wheelofdespair-0.0.1-py3-none-any.whl
python3 -m venv env
. env/bin/activate
pip install --upgrade pip
pip install wheelofdespair-0.0.1-py3-none-any.whl

Output

Collecting wheelofdespair
  Downloading wheelofdespair-0.0.1-py3-none-any.whl.metadata (201 bytes)
Downloading wheelofdespair-0.0.1-py3-none-any.whl (1.5 kB)
Installing collected packages: wheelofdespair
Successfully installed wheelofdespair-0.0.1
PoC: Wheel-of-Despair code execution.

Code of Conduct

[X] I agree to follow the PSF Code of Conduct.

calebbrown commented 1 week ago

Just a couple more additions:

this behavior is related to pip installing into the same location that pip is running from
this may have security implications based on the usage of pip by users. For example, pip install --only-binary :all: could be used in a trusted context, before using the installed packages in an untrusted context (e.g. different stages in a build pipeline).

ichard26 commented 1 week ago

I'll note that there are other ways to compromise pip. A malicious wheel could replace a key file used by pip, which is then picked up on the next invocation. Or they could replace the pip script on PATH. Etc.

But yeah, this does make it easier to achieve arbitrary code execution as it only requires one invocation. We already eagerly import the self-check module when upgrading pip (to avoid crashes). It would be reasonable to always import the module eagerly in the install command module. https://github.com/pypa/pip/blob/fe0925b3c00bf8956a0d33408df692ac364217d4/src/pip/_internal/commands/install.py#L411-L416

Feel free to send a PR. Thanks for investigating and letting us know!

P.S. I haven't looked at this in detail, but I suspect there are other lazy imports in the codebase. Not sure if they're suspectible to ACE or not.

calebbrown commented 1 week ago

Thanks @ichard26 for the quick triage.

Looking at strace during pip install, the only other import I can see is pip._internal.utils.entrypoints but that appears to be imported through pip._internal.self_outdated_check.

I'll create a PR for this, but would you still like to keep the lazy loading except for install (i.e. remove the if modifying_pip condition but keep the import where it is), or would you prefer to make it non-lazy globally and import at the top of pip._internal.cli.index_command?

ichard26 commented 1 week ago

The import was made lazy in order to avoid importing the entire network and index (HTML) parsing stack. This improves start-up time for the commands that don't need these components. For example, pip list is an index command, but usually does not access the network at all and thus should not perform a self-check or import the machinery needed for the self-check. The tricky part is that a command like pip list --outdated does require the network and can perform a self-check. This makes an eager import at the top of cli.index_command unacceptable.

(i.e. remove the if modifying_pip condition but keep the import where it is)

It'd probably be more robust to simply import the self-check at the top of commands.install.

di commented 1 week ago

Would definitely be great to fix this if possible, but I'm curious about setting a precedent here: is this behavior pip would be willing to guarantee even if the wheel spec does not specifically address it? Or is this only a best-effort fix?

If the goal is to guarantee the behavior, maybe @calebbrown you would be willing to help write a test here that would prevent a future regression, and this could be documented as well?

pfmoore commented 1 week ago

I don't think we'd want to guarantee this.

The fact that a wheel can install files for an arbitrary import package is a feature, not a bug[^1] - pillow installs PIL, setuptools installs pkg_resources, etc. The fact that pip allows a wheel to install files that overwrite those of an existing package is a known issue, and https://github.com/pypa/pip/issues/4625 is tracking this. As you'll notice if you read that issue, it's not a trivial problem to fix. The fact that "lazy" imports[^2] are affected if you alter the contents of sys.path while the program is running is a feature of Python's import system.

So while I'd be fine with a change that removes this specific issue, and as a result reduces the risk of problems, I don't think it's something we should try to guarantee. Users need to understand that when they install a wheel, it can affect the behaviour of both programs they subsequently run, and currently running programs. That isn't just pip - to give another example, if you have a service running from a Python environment and you install something new in that environment, the service can be affected. Ultimately, it is the user's responsibility to ensure that they only install trusted packages.

If someone wanted to write a section for the packaging user guide covering the trust and threat models for Python packaging, I'm sure that would be extremely useful.

[^1]: Although it's a feature that's open to abuse, and we could consider changing it, if anyone had the stomach for addressing the backward compatibility issues. [^2]: They aren't technically "lazy", they just aren't done at program startup.

di commented 1 week ago

At the risk of getting quoted if/when this gets used by a bad actor: I would argue that we shouldn't fix things we don't plan to keep fixed. If this is just a subclass of #4625 and would be resolved there, seems like this would be considered a duplicate of that issue, even if it's a novel path to reproduce it.

pfmoore commented 1 week ago

I think the vector of attacking the running instance of pip is unexpected enough that we should cover ourselves against it. There's no point making things easier for attackers. I just don't think we should guarantee it, as that suggests that users don't need to worry about this. And honestly, I think that anyone who is genuinely concerned about Python security should be made more aware that it's their responsibility to do due dilligence, rather than assuming that volunteer projects are willing to cover that for them.

To that end, I'd strongly favour adding a security section to the packaging user guide, as I mentioned above. But I have neither the expertise nor the time to write such a thing myself.

calebbrown commented 1 week ago

As an outsider, a potential solution that would solve this for pip would be to prevent any package other than pip from updating pip.

This would leave #4625 unaddressed, but protect pip.

pfmoore commented 1 week ago

I can come up with attack vectors that would get past this (drop a .pth file into site-packages that modifies sys.path to put an override to pip ahead of pip itself, for example). So my position is unchanged - I think it's OK to protect what we can, but we shouldn't give anyone the impression that we guarantee that no harm is possible.

ichard26 commented 1 week ago

I'm going to tack this onto the 25.0 milestone so we don't forget to address this trivially fixable vulnerability. Larger discussions on pip security can occur later.

pypa / pip