pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.51k stars 3.02k forks source link

pip should read configuration from `pyproject.toml` for local projects #13003

Open notatallshaw opened 1 week ago

notatallshaw commented 1 week ago

What's the problem this feature will solve?

Increasingly users expect tools to be able to read from their pyproject.toml for their local project. At this point pip stands as an outlier in not being able to define pip configuration in pyproject.toml.

Describe the solution you'd like

When installing a local package that has a pyproject.toml check for a section [tool.pip] and prioritize any configuration there as more important than configuration in environmental variables or configuration in the users environment (e.g. system directory, user directory, venv directory), but not as important as configuration passed on the command line.

Alternative Solutions

Do nothing

Additional context

This ticket is created to track this feature and point issues that require it to here. I think this would require a lot of design choices and I am not volunteering to lead it. But perhaps someone else might be sufficiently motivated!

Some immediate design choices that come to mind are:

Code of Conduct

RonnyPfannschmidt commented 1 week ago

Without opt in this can be abused

notatallshaw commented 1 week ago

Without opt in this can be abused

Can you expand, this approach is already being used by other front end package tools such as uv, rye, poetry, hatch, etc.

potiuk commented 1 week ago

Without opt in this can be abused

Yeah. I think it's a good idea to add "pip frontend" configuration options for pyproject.toml users. And I cannot see easily scenario where it could be abused. It would likely require a new tool.pip section in pyproject.toml - which is unlikely to be used by anyone else.

When it comes to security/abuse - the only thing I see where It could be abused is when some external "tooling" (IDE's mainly) would automatically start using it and use it in some kind of security-related scenario. I think that could be mitigated by only adding configuration that is not "security-scenario-sensitive" to be handled by pyproject.toml. But there also might be other scenarios I have not thought about. Maybe you can elaborate and give some examples @RonnyPfannschmidt ?

potiuk commented 1 week ago

Also just to add a concrete "positive" scenario - I can imagine it would be super useful to have private package registries configured in pyproject.toml - so that the "corporate users" do not have to have any extra configuration (env vars etc.)

pfmoore commented 1 week ago

For abuse, how about a sdist that included an index URL in its pyproject.toml alongside a build requirement that looked OK, but was substituted with malicious code on the replacement index? Yes, if you're building a sdist you're already exposed, but this hides the risk and so makes it easier to fool an audit.

potiuk commented 1 week ago

For abuse, how about a sdist that included an index URL in its pyproject.toml alongside a build requirement that looked OK, but was substituted with malicious code on the replacement index? Yes, if you're building a sdist you're already exposed, but this hides the risk and so makes it easier to fool an audit.

Yes, I see that there are some potential "sensitive" properties, but I'd argue if those are declarative parameters are made part of published standard (and maybe even finding some way to mark such options are sensitive) it's far easier to make audit tools to support it, than for example analyse "hatch_build.py" code that can be triggered automatically by just adding "hatchling" as your build tool.

So I'd argue that increases the "auditability" rather than decreases it, because declarative specification is far easier to audit than dynamic python code (which is currently already supported).

notatallshaw commented 1 week ago

For abuse, how about a sdist that included an index URL in its pyproject.toml alongside a build requirement that looked OK, but was substituted with malicious code on the replacement index? Yes, if you're building a sdist you're already exposed, but this hides the risk and so makes it easier to fool an audit.

This proposal is for local projects, not packaged ones, so this scenario wouldn't apply.

And yes, sdists can already run arbitrary code, including calling pip with any any arguments.

pfmoore commented 1 week ago

This proposal is for local projects, not packaged ones

Sorry, I'd missed that. How would it work then? Would pip have to change its config settings mid-run, when it detected that it was installing from a local project? How would something like pip install ./foo ./bar work? Would it use the settings from foo/pyproject.toml or from bar/pyproject.toml? If both foo and bar depend on a named package baz, which project's settings would be used when searching for baz?

The point here is that pip currently reads its config at startup, which is before it's determined what (if any) local project(s) are being installed. So this would be a significant change to pip's config handling, to change it dynamically, during a single run.

I'm -1 on this as it stands. But I understand that it's a placeholder for someone who's interested in this feature to make a fully-specified proposal, so maybe it would be better to say that I doubt that it will be possible to come up with a satisfactory proposal for this.

notatallshaw commented 1 week ago

Sorry, I'd missed that. How would it work then? Would pip have to change its config settings mid-run, when it detected that it was installing from a local project? How would something like pip install ./foo ./bar work? Would it use the settings from foo/pyproject.toml or from bar/pyproject.toml? If both foo and bar depend on a named package baz, which project's settings would be used when searching for baz?

I would suggest that we take a look at existing tools that already face this problem, e.g. uv, rye, poetry, hatch, etc.

The point here is that pip currently reads its config at startup, which is before it's determined what (if any) local project(s) are being installed. So this would be a significant change to pip's config handling, to change it dynamically, during a single run.

Perhaps this restrictions means that pip will only take the settings once, and by default from the immediate current working directory. I'm not sure, I agree it is an important design choice to anyone who wants to take this on, thanks for bringing it up.

potiuk commented 1 week ago

I'm -1 on this as it stands. But I understand that it's a placeholder for someone who's interested in this feature to make a fully-specified proposal, so maybe it would be better to say that I doubt that it will be possible to come up with a satisfactory proposal for this.

Very good points @pfmoore - indeed thanks for bringing it up.

I agree, this sounds like pretty complex one to pull off in a reasonable fashion. And looking at the complexities involved, it is quite an investment (and I also started to doubt if it is worth it at all).

By looking at https://docs.astral.sh/uv/configuration/files/ for example, they are merging configuration from uv.toml and pyproject.toml, but it's already pretty unclear how it works, I can immediately think of a number of edge-cases that are not really straightforward to handle.

Side comment - now when I think of it, from the "philosophy" of the way how I understand packaging works - in this case the pyproject.toml "x.tool" section should only really appy to build backends, not to frontends using it. Pip is only "frontend" and does not have "backend" capabilities.

I think one of the big benefits of the "frontend/backend" split is that (ideally) project maintainers choose the backend and appropriate backend configuration in the "tool" section - so that their projects can be build reproducibly in local case regardless from the "frontend" anyone uses. And "frontend" configuration should be entirely driven by other means - and should be really "optional", so having it in "pyproject.toml" somewhat mixes the two. Probably in this case better to keep frontend configuration in a separate file to make a cleaner split - even if you want to provide some defaults for certain frontend.

pfmoore commented 1 week ago

in this case the pyproject.toml "x.tool" section should only really appy to build backends, not to frontends using it

That was the original intention. Unfortunately (or fortunately, depending on your point of view!) tools other than build backends found the idea of a single file for all configuration very tempting, and so we ended up in the current situation.

But you're right - I don't think pyproject.toml is a good place for build frontends (in particular, pip) to put their configuration.

it's already pretty unclear how it works, I can immediately think of a number of edge-cases that are not really straightforward to handle.

If you can produce good, reproducible bug reports, I'd strongly encourage you to submit them to uv. Either they will be able to fix them and in doing so come up with a robust model of how frontends should read configuration from pyproject.toml, or it will make it much easier to argue that it's not a good model.

notatallshaw commented 1 week ago

By looking at https://docs.astral.sh/uv/configuration/files/ for example, they are merging configuration from uv.toml and pyproject.toml, but it's already pretty unclear how it works, I can immediately think of a number of edge-cases that are not really straightforward to handle.

I agree the rules of precedence and merging need to be well defined.

Though I would point out that pip already faces similar problems, as it merges it's own configuration files from system directories, user directories, site directories, environmental variables, and CLI. For example, on Windows does %APPDATA%\pip\pip.ini or %USERPROFILE%\pip\pip.ini take precedence? Because both work and the latter is undocumented.

potiuk commented 1 week ago

If you can produce good, reproducible bug reports, I'd strongly encourage you to submit them to uv. Either they will be able to fix them and in doing so come up with a robust model of how frontends should read configuration from pyproject.toml, or it will make it much easier to argue that it's not a good model.

Oh absolutely - I already work closely with uv team - they fixed 6 of my issues that we needed to make uv for Airflow work and they did it with a lightning speed https://github.com/astral-sh/uv/issues?q=is%3Aissue+author%3Apotiuk+is%3Aclosed

Also they responded to our idea of how Airflow is structured and implemented the workspace feature (basically using Airflow as the example where it could be useful) - and we are just in the middle of switching to uv as a development front-end - precisely because of the workspace feature that was absolutely necessary for us.

This comment: https://github.com/astral-sh/uv/issues/3404#issuecomment-2119548716 is that they modelled their solution on.

So yeah. I am already doing it.