Open calebbrown opened 1 week ago
Just a couple more additions:
this behavior is related to pip
installing into the same location that pip
is running from
this may have security implications based on the usage of pip
by users. For example, pip install --only-binary :all:
could be used in a trusted context, before using the installed packages in an untrusted context (e.g. different stages in a build pipeline).
I'll note that there are other ways to compromise pip. A malicious wheel could replace a key file used by pip, which is then picked up on the next invocation. Or they could replace the pip script on PATH. Etc.
But yeah, this does make it easier to achieve arbitrary code execution as it only requires one invocation. We already eagerly import the self-check module when upgrading pip (to avoid crashes). It would be reasonable to always import the module eagerly in the install command module. https://github.com/pypa/pip/blob/fe0925b3c00bf8956a0d33408df692ac364217d4/src/pip/_internal/commands/install.py#L411-L416
Feel free to send a PR. Thanks for investigating and letting us know!
P.S. I haven't looked at this in detail, but I suspect there are other lazy imports in the codebase. Not sure if they're suspectible to ACE or not.
Thanks @ichard26 for the quick triage.
Looking at strace during pip install
, the only other import I can see is pip._internal.utils.entrypoints
but that appears to be imported through pip._internal.self_outdated_check
.
I'll create a PR for this, but would you still like to keep the lazy loading except for install
(i.e. remove the if modifying_pip
condition but keep the import where it is), or would you prefer to make it non-lazy globally and import at the top of pip._internal.cli.index_command
?
The import was made lazy in order to avoid importing the entire network and index (HTML) parsing stack. This improves start-up time for the commands that don't need these components. For example, pip list
is an index command, but usually does not access the network at all and thus should not perform a self-check or import the machinery needed for the self-check. The tricky part is that a command like pip list --outdated
does require the network and can perform a self-check. This makes an eager import at the top of cli.index_command
unacceptable.
(i.e. remove the if modifying_pip condition but keep the import where it is)
It'd probably be more robust to simply import the self-check at the top of commands.install
.
Would definitely be great to fix this if possible, but I'm curious about setting a precedent here: is this behavior pip
would be willing to guarantee even if the wheel spec does not specifically address it? Or is this only a best-effort fix?
If the goal is to guarantee the behavior, maybe @calebbrown you would be willing to help write a test here that would prevent a future regression, and this could be documented as well?
I don't think we'd want to guarantee this.
The fact that a wheel can install files for an arbitrary import package is a feature, not a bug[^1] - pillow installs PIL, setuptools installs pkg_resources, etc. The fact that pip allows a wheel to install files that overwrite those of an existing package is a known issue, and https://github.com/pypa/pip/issues/4625 is tracking this. As you'll notice if you read that issue, it's not a trivial problem to fix. The fact that "lazy" imports[^2] are affected if you alter the contents of sys.path
while the program is running is a feature of Python's import system.
So while I'd be fine with a change that removes this specific issue, and as a result reduces the risk of problems, I don't think it's something we should try to guarantee. Users need to understand that when they install a wheel, it can affect the behaviour of both programs they subsequently run, and currently running programs. That isn't just pip - to give another example, if you have a service running from a Python environment and you install something new in that environment, the service can be affected. Ultimately, it is the user's responsibility to ensure that they only install trusted packages.
If someone wanted to write a section for the packaging user guide covering the trust and threat models for Python packaging, I'm sure that would be extremely useful.
[^1]: Although it's a feature that's open to abuse, and we could consider changing it, if anyone had the stomach for addressing the backward compatibility issues. [^2]: They aren't technically "lazy", they just aren't done at program startup.
At the risk of getting quoted if/when this gets used by a bad actor: I would argue that we shouldn't fix things we don't plan to keep fixed. If this is just a subclass of #4625 and would be resolved there, seems like this would be considered a duplicate of that issue, even if it's a novel path to reproduce it.
I think the vector of attacking the running instance of pip is unexpected enough that we should cover ourselves against it. There's no point making things easier for attackers. I just don't think we should guarantee it, as that suggests that users don't need to worry about this. And honestly, I think that anyone who is genuinely concerned about Python security should be made more aware that it's their responsibility to do due dilligence, rather than assuming that volunteer projects are willing to cover that for them.
To that end, I'd strongly favour adding a security section to the packaging user guide, as I mentioned above. But I have neither the expertise nor the time to write such a thing myself.
As an outsider, a potential solution that would solve this for pip
would be to prevent any package other than pip
from updating pip
.
This would leave #4625 unaddressed, but protect pip.
I can come up with attack vectors that would get past this (drop a .pth
file into site-packages that modifies sys.path
to put an override to pip ahead of pip itself, for example). So my position is unchanged - I think it's OK to protect what we can, but we shouldn't give anyone the impression that we guarantee that no harm is possible.
I'm going to tack this onto the 25.0 milestone so we don't forget to address this trivially fixable vulnerability. Larger discussions on pip security can occur later.
Description
Versions of pip since 24.1b1 allow someone to run arbitrary code after a specially crafted bdist whl file is installed.
When installing wheel files pip does not constrain the directories the wheel contents are written into, except for checks that ensure traversal is only within the destination directories (e.g, purelib, platlib, data, etc) (see #4625)
This means a wheel is able to place files into existing modules that belong to other packages, such as pip, setuptools, etc.
If the installer lazily imports a module after the wheel is installed it is possible for the wheel to overwrite the module with its own code, which is then imported unintentionally by the installer.
For pip, this has been true since 24.1b1 when a change was introduced that dynamically loads the
pip._internal.self_outdated_check
module after running a command to check if pip needs upgrading.Because this module is loaded after a package has been installed, a wheel can overwrite
{purelib}/pip/_internal/self_outdated_check.py
and have the code within it automatically executed whenpip install {wheel}
is run.Expected behavior
This behavior is surprising. My understanding is that most Python users expect wheels can't run code during installation.
For example, the recent blog post on command jacking demonstrates this expectation:
That said, the wheel spec says nothing about security, or avoiding on-install code execution.
pip version
24.1b1
Python version
v3.11 later
OS
any
How to Reproduce
mv wheelofdespair-0.0.1-py3-none-any.zip wheelofdespair-0.0.1-py3-none-any.whl
python3 -m venv env
. env/bin/activate
pip install --upgrade pip
pip install wheelofdespair-0.0.1-py3-none-any.whl
Output
Code of Conduct