pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.4k stars 2.99k forks source link

Add `--only-deps` (and `--only-build-deps`) option(s) #11440

Open flying-sheep opened 1 year ago

flying-sheep commented 1 year ago

https://github.com/pypa/pip/issues/11440#issuecomment-1445119899 is the currently agreed upon user-facing design for this feature.


What's the problem this feature will solve?

In #8049, we identified an use case for installing just the dependencies from pyproject.toml.

As described in the solution section below --only-deps=<spec> would determine all dependencies of <spec> excluding that package itself and install those without installing the package. It could be used to

  1. allow specifying environment variables that are active only while building a package of interest (without having it be active while potentially building its dependencies).
  2. separate dependency installation from building and installing a package, allowing to rebuild a package in a docker build while the dependency installation step is loaded from cache.

This example shows both use cases:

# copy project metadata and install (only) dependencies
COPY pyproject.toml /myproj/
WORKDIR /myproj/
RUN pip install --extra-index-url="$PIP_INDEX" --only-deps=.[floob]

# copy project source files, build in a controlled environment and install our package
COPY src/mypkg/ /myproj/src/mypkg/
RUN env SETUPTOOLS_SCM_PRETEND_VERSION=2.0.2 python3 -m build --no-isolation --wheel
RUN pip install --no-cache-dir --no-dependencies dist/*.whl

Instead of the solution from #8049, @pradyunsg prefers a solution similar to the one below: https://github.com/pypa/pip/issues/8049#issuecomment-1079882786

Describe the solution you'd like

One of those two, or similar:

  1. (used in the example above)

    --only-deps would work like -r in that it’s not a flag globally modifying pip’s behavior but a CLI option with one argument that can be specified multiple times. Unlike -r it accepts a dependency spec and not a path to a file containing dependency specs.

    Where pip install <spec> first installs all dependencies and then (build and) install the package referred to by the spec itself, pip install --only-deps=<spec> would only install the dependencies.

  2. --only-deps would work like --[no|only]-binary, in that it requires an argument specifying what package not to install. A placeholder like :requested: could be used, e.g.:

    pip install --only-deps=:requested: .[floob]

Alternative Solutions

Additional context

NA

Code of Conduct

rgommers commented 1 year ago

You're both missing something important. Donald's blog post is describing abstract vs. concrete dependencies (an important distinction), and application deployment vs. library development (also an important distinction). It then goes on to discuss the misuse of requirements.txt for application development with abstract dependencies that are piped into setup.py (indeed often an anti-pattern). So far so good. Where it then goes wrong is in assuming that requirements.txt is only for concrete dependencies and that no one has a need for installing abstract dependencies. This is simply incorrect, there are valid and common use cases for both, and indeed requirements.txt is widely used for both in practice. This is one of the problems with requirements file - it mixes this up, and there's no good alternative.

I assure you that I understand the difference between abstract and concrete dependencies and when to use which (and so do others here). Examples:

So, we need the ability to install both concrete and abstract dependencies. Right now there isn't much to choose from besides requirements files. It is helpful to separate concrete vs. abstract, teach users about this more, and use clean standards-based interfaces. So in the future, what it'll hopefully look like is:

It seems to me like the need for installing abstract dependencies is completely clear to all non-pip devs here, to the extent that it wasn't even explicitly articulated well. And pip devs seem to all assume the opposite, because they are not even understanding that these use cases are valid and common.

Personally, I feel we already have enough problems from extending pip's scope from "installing stuff" to ...

Fair enough. This is about installing though, and in a fairly straightforward way.

And to be honest, the more people try to argue that pyproject.toml is an appropriate way of recording this information, the more they are persuading me of the exact opposite

This seems a bit dismissive to be honest.

Theory aside, what's the way forward here?

Let's clarify any remaining open questions and then agree that it's okay to implement them with an agreed-upon UX? Followed by finding a volunteer to do so.

using a new option that explicitly takes the name of a PEP 621 format file to read the requirement data from

That seems fine, although I am wondering if it's necessary to give the file name explicitly given that it's guaranteed to be always the same. A more constrained alternative, closing the door on folks using non-standardized myreqs.toml files, would be to leave out the name and always use a file named pyproject.toml in the current working directory. Or a new command like pip install-deps. No clear preference on UX from me, as long as it gives the ability to install all dependency groups inside pyproject.toml, just something to consider.

why is it OK that this option doesn't have the additional features that requirements files do?

As described higher up, for more advances/niche usage, it seems file to not use a standardized file.

are we sure that the limitations of dependency specifiers (e.g., no way of using relative directory names) aren't going to be an issue

For my use cases, yes I am sure. I don't even understand what this would do exactly. Hard to read other people's minds, but let's say it's fine unless someone shows up with clearly explained needs?

Aeron commented 1 year ago

project.dependencies is fundamentally a place for abstract dependencies, requirements.txt is fundamentally a place for concrete dependencies

In my humble understanding, this feature is desired for that use case—to have abstract dependencies with much fuzzier version matching. People are making stand-alone apps constantly. So, if we can do pip install ., having just pyproject.toml, why not do the same but only for dependencies? It usually happens in the scope of local development.

Does pyproject.toml somehow indicates that a project is a library, to insist on only one way of installation? Ah, yes, the name field. Which (kind of) points out that it’s a package, not necessarily a library. So, nope. Why treat it like this?

It is not legal to have a pyproject.toml file that has a project.dependencies specified but does not have a project.name and a project.version (or a build backend that will fill it in).

Well, arrest me then. Many stand-alone apps have no name or version specified because they don’t need to. Yet it’s a cozy place for the tools section, where tool configurations live.

So, following the logic, developers of stand-alone apps are strictly prohibited from using pyproject.toml for the convenience of having, for example, mypy configurations inside if there’s no project.name or project section at all, am I following? Well, that’s sad.

Also, I don’t think PEP-621’s wording means that the name field is required. It looks like it’s up to the tools to demand it. And what are those? Those are building systems, mostly. (A static analyzer and a linter doesn’t give a dime what my project name is.) And what if we don’t use such systems because our project is a stand-alone application? Yeah.

But to heck with it, PEP is not a subpoena. We’ll start putting project.name in pyproject.toml. Now it’s a grown-up package. A package of—yes—a stand-alone application. Can we have a simple way to install the dependencies now?

If the UX of the --only-deps is confusing due to the assumption of a following list of packages and locking it to the dot-path is not an option, then why not go away from the install command completely? Let’s consider introducing the install-deps command that will only install dependencies from pyproject.toml. In this case we can utilize the --with-optional parameter.

What about this one?

-e https://github.com/pypa/packaging.git#egg=packaging

It’s a question about the design of project.dependencies section. Somehow nobody bothered about it for the case of a simple pip install .. Or is it considered so meta that all values there are just informational? Why the heck pip still respect and uses it? Because it’s intentional, but somehow it’s only halfway done. And we—being people who upvoted this issue—see significant value in the other half.

The bottom line here is that if people use your microscope as a shovel, maybe it’s a better shovel, so it’s strange to ignore the practical side and voice of the community.

EDIT: Damn, while I was writing and formatting my thoughts, beautiful man @rgommers wrote almost the same, to the point of proposing the same command. Maybe it’s the way to go. And yes, I agree, a few things sound a bit dismissive.

RonnyPfannschmidt commented 1 year ago

I think it's very important to take a step back and recognise the 2 different use cases

One is all about get something ready where all that's missing is the installation of the package

Which makes sense for stuff like docker layering, developer environment preparations & co

The other is pretty much aiming for having pip install . work like pip install requirements.in

And the ask to have a incomplete broken metadata specification which also lacks a number of features from the requirements file seems woefully misplaced

Let's please keep in mind what pip stands for

There is a whole battery of tools that can handle and sync non package project

dstufft commented 1 year ago

Well, arrest me then. Many stand-alone apps have no name or version specified because they don’t need to. Yet it’s a cozy place for the tools section, where tool configurations live.

So, following the logic, developers of stand-alone apps are strictly prohibited from using pyproject.toml for the convenience of having, for example, mypy configurations inside if there’s no project.name or project section at all, am I following? Well, that’s sad.

I worded that very carefully, it's not valid to have project.dependencies if you don't already have project.name and project.version.

If you omit the project table completely, then that sentence still holds true, so a pyproject.toml without a project table and just a tool table is valid-- although from the point of view from pip, that source tree is still a package, it's just a package being built by setuptools, and relying entirely on pre PEP 621 metadata.

pfmoore commented 1 year ago

This seems a bit dismissive to be honest.

Sorry. It wasn't meant to be. It was intended as a genuine attempt to flag that I was apparently missing your point badly enough that we were going to risk everyone getting entrenched in their positions. (Obviously a failed attempt. My bad, it was late at night...)

Let's clarify any remaining open questions and then agree that it's okay to implement them with an agreed-upon UX?

The big open question is "what is a project" and I don't really want to get into that here. Apart from anything else, it's a matter of rather fundamental definitions, and it needs to be raised on Discourse if we want to get community consensus, not just on the pip tracker. I'm doing my best to find ways to avoid having that discussion, but I'm not sure it's going to be possible. Unfortunately, in some ways it's long overdue - this thread clearly demonstrates that there's not a consistent view of what a "project" is.

If we can avoid getting into that one, the biggest other questions are:

  1. Can we realistically limit the scope to "only projects that declare static dependencies in their pyproject.toml"? You've asserted this, but I've seen no evidence of why a project with dynamic dependencies wouldn't want this functionality as well.
  2. What about editable installs? I've mentioned this a few times, but no-one has responded. If you install the static dependencies, then for many build backends, pip install -e . will still need to pull in an editable support library. That doesn't fit with the "prepare the development environment ahead of time" idea. Again, how do we ensure we won't be asked to "fix" this limitation later?
  3. What exactly do we allow the user to specify (if we accept that the term "project" is too fraught with controversy to use)? I tried to avoid that question by requiring that the pyproject.toml file itself be specified, but even to me that feels like a clumsy hack. We need something that's clearly different from a normal requirement or project name, as otherwise people will expect it to work for more general cases than it does.
  4. How do we explain this feature in a way that matches the intuition of the average developer, without misleading them into believing it works in more cases than it actually does? Because the way this discussion is going, we're deliberately making the feature not work in cases that a reasonable developer might expect it to, if we don't take care to avoid that.

I'm sorry if a lot of the above sounds paranoid. But we've had a number of features over the years that were initially limited in scope, but didn't stay that way, with follow-up demands for extended scope, typically long after the contributors involved in the original design have moved on.

although I am wondering if it's necessary to give the file name explicitly given that it's guaranteed to be always the same

That was deliberate, because I wanted to avoid getting into the murky waters of "what is a project?" and whether we "have" to support general specifiers if we support ".". If you don't like naming the toml file, then we may need to have that debate (or avoid it by some other means, such as defining the argument as "a directory that must contain a pyproject.toml file", but we'll inevitably get people calling that a "project", and we'll be back to the question you were hoping to avoid...)

At this point, I don't have the energy to debate abstract questions like "what is a project?" If that continues to be the sticking point, I think we're going to continue going round in circles, and I'm not comfortable doing that.

So in the future, what it'll hopefully look like is:

  • most common usages of concrete dependencies -> standardized lock file format
  • most common usages of abstract dependencies -> pyproject.toml
  • more advances/niche use case -> tool-specific files like requirements.txt or Poetry/PDM-specific lock file features

This still ducks the question of dynamic dependencies. You can't just describe those as "more advanced/niche" use cases - there's no evidence that this is the case. So while I somewhat agree with your classification, the way I read it is that the abstract dependency case will need to invoke the build backend, because otherwise there's a whole class of use cases that we don't cover. And that's when we end up with your arguments persuading me of the exact opposite of what you're trying to argue...

rgommers commented 1 year ago

Thanks Paul.

The big open question is "what is a project" and I don't really want to get into that here

I don't have any appetite for getting into that either. If explicitly listing pyproject.toml as the input avoids this, great. Any reasonable UX choice is fine with me.

Can we realistically limit the scope to "only projects that declare static dependencies in their pyproject.toml"? You've asserted this, but I've seen no evidence of why a project with dynamic dependencies wouldn't want this functionality as well.

Do you know of any examples of packages with dynamic dependencies? I'm reasonably convinced that it is indeed very niche and can't think of any real-world examples (piping a static requirements.txt file in obvious does not count, that is what we are trying to get away from).

This seems to be the main concern regarding the actual design, so we need some actual use cases. Again, it's also not something that requirements files support at all, and it's hard to come up with examples, so I'm not sure how to say more.

What about editable installs? I've mentioned this a few times, but no-one has responded. If you install the static dependencies, then for many build backends, pip install -e . will still need to pull in an editable support library.

I don't quite understand the problem here, I'd expect that if there is an extra requirement needed for editable installs (odd to me, because I'd expect a build backend to provide this or depend on whatever it needs already, but okay fine if that exists), that is listed under a dev group of optional-dependencies, just like any other dev dependency. That there is a separate get-requires-for-build-editable doesn't seem all that interesting to me. Any type of dependency should be in dependencies, optional-dependencies or requires, editable install deps don't need some kind of exception to that rule.

pfmoore commented 1 year ago

Do you know of any examples of packages with dynamic dependencies?

Well, there's a discussion here.

Beyond that, I don't have any specific examples - but if you think there's no real-world need for them, maybe we should be deprecating them and requiring all dependencies to be static? Yes, that's a bit of a straw man, but how far are we willing to go in ignoring supported features of the existing standards because they make it harder to write tooling? I get the impression that I'm willing to go a lot less far than you are...

I don't quite understand the problem here, I'd expect that if there is an extra requirement needed for editable installs (odd to me, because I'd expect a build backend to provide this or depend on whatever it needs already, but okay fine if that exists)

For an example, see here. A build backend that uses the editables library to build an editable wheel will (in some cases) create a wheel that depends at runtime on the editables package. The details of the mechanism are here. The resulting "editable wheel" has metadata that includes a dependency that's not specified in pyproject.toml, even if the dependency metadata is declared as static. This is explicitly allowed by PEP 660:

Build-backends must produce wheels that have the same dependencies (Requires-Dist metadata) as wheels produced by the build_wheel hook, with the exception that they can add dependencies necessary for their editable mechanism to function at runtime (such as editables).

So in this case, doing a pip install --only-deps will omit those extra dependencies. That's fine in one sense, as pip install -e . will simply install them when installing the editable wheel. But once again that might be considered a bug, if the user expected to be able to work offline once they had set up the runtime environment using this feature. We can declare that usage out of scope, but at that point we seem to be just declaring everything out of scope that doesn't fit our simple "read pyproject.toml and install that" model, regardless of what users actually want.

rgommers commented 1 year ago

Well, there's a discussion here.

Yes, I'm aware of that one - it's my use case about numpy API usage related constraints to begin with. That's more a case of the requirements being static, but due to the Python packaging design not yet allowing for wheel deps being narrower than sdist deps (which Henry is writing a PEP for to get that design problem fixed) having to cheat by putting everything under dynamic. I'm perfectly fine with not being able to use the feature in such a case - I'd like a good design long-term, and getting that PEP implemented is that, rather than over-accounting for dynamic deps.

but if you think there's no real-world need for them, maybe we should be deprecating them and requiring all dependencies to be static? Yes, that's a bit of a straw man, but how far are we willing to go in ignoring supported features of the existing standards because they make it harder to write tooling?

I think that is not correct reasoning. "Install all static dependencies listed in pyproject.toml" is a perfectly reasonable, easy to explain and easy for the user to understand new feature. If something is dynamic, it comes from outside of pyproject.toml and is hence not listed in pyproject.toml. A feature like that does not mean that we should deprecate and remove dynamic dependencies completely. They should see little use and PEP 621 already explicitly recommends using static metadata, but dynamic ones are an escape hatch that may be needed in corner cases.

Quoting PEP 621 (bold face mine): "As such, making it easy to specify metadata statically is important. This also means that raising the cost of specifying data as dynamic is acceptable as users should skew towards wanting to provide static metadata."

PEP 621 is completely right here, and I think this is a sufficient answer to your concern.

The resulting "editable wheel" has metadata that includes a dependency that's not specified in pyproject.toml, even if the dependency metadata is declared as static.

Two comments:

  1. As I already said, the dependency should be listed as an optional dependency in the dev section of pyproject.toml. This fixes the potential hiccup here, is very easy to do, and is good practice anyway (optional runtime dependency becomes detectable by static analysis like used for the GitHub Dependency Graph, makes it visible to tools like Grayskull, etc.).
  2. Editable installs have much bigger design issues unless you're using --no-build-isolation. The pip design of pip install -e . plain does not work at all for meson-python (and scikit-build-core will have the same issue). We recommend always using --no-build-isolation, other usage is unsupported and won't work. PEP 660 and pip's default is misdesigned, and doesn't make sense when you're using compiled code. See https://github.com/mesonbuild/meson-python/issues/278 if you're interested. And if you do use --no-build-isolation, you will already have the build backend and its dependencies (including editables) installed.

but at that point we seem to be just declaring everything out of scope that doesn't fit our simple "read pyproject.toml and install that" model

That is the feature request though. We clarified in the long discussion in this issue that it's not --only-deps <any-kind-of-specifier>, it really is "read from pyproject.toml". It sounds like you're still arguing from the more broadly scoped version.

I mean, maybe let's explicitly poll everyone user who is asking for this if you're not sure? Between this issue, gh-8049 and gh-11927 there's many users asking for this. The other two issues even have in pyproject.toml in the title. It seems that this is the need, and it's being asked for repeatedly for 3 years now.

pfmoore commented 1 year ago

I don't think there's much point carrying on this discussion. Someone can create a PR and I'll raise my objections there, where we are discussing something specific.

Or I can release my install-deps script and people can use that and avoid having to debate with the pip maintainers over whether this is an appropriate feature for inclusion in pip. I'll probably do that anyway, and if it gets no users, we can take that as good evidence that the feature isn't really as critical as people are suggesting[^1] 🙂

[^1]: Claiming it needs to be in pip because "everyone has pip installed" is approximately the same in my mind as the people who argue that everything should be in the stdlib "because it comes with Python".

rgommers commented 1 year ago

Claiming it needs to be in pip because "everyone has pip installed"

That's not something I claimed, and I didn't see anyone else do that either. I'll just quote myself from higher up:

It's not the effort (which is indeed not large), it's that to actually use this one would have to publish it, maintain it long-term, and use that in the docs of the tool of interest (say VS Code for Brett, NumPy/SciPy for me). This feature is not for us personally, it's for our large-ish audiences. It feels a bit like something that is or could turn into a fork of the pip UX, which is why I didn't consider it.

I don't think there's much point carrying on this discussion. Someone can create a PR and I'll raise my objections there, where we are discussing something specific.

This kind of topic/discussion can be complex and hard to do in writing. I have already invited you to a higher-bandwidth call once this week, and l feel like I have to do so once more here. I think it'd help, not just with this topic but also to build a bit more mutual understanding.

Re-familiarizing myself with the pip code base, opening a PR and then having you likely shoot it down with basically the exact same arguments you've already made seems less attractive though.

pfmoore commented 1 year ago

That's not something I claimed, and I didn't see anyone else do that either.

Sorry, I didn't mean to imply that, I was just trying to pre-empt that possible objection (which is often used when external utilities get suggested). The claim you quoted doesn't seem that problematic, as I've offered to publish the tool myself. Of course, you may have concerns that a tool I publish might not work the way you want it to, but that's a whole other matter...

This kind of topic/discussion can be complex and hard to do in writing. I have already invited you to a higher-bandwidth call once this week, and l feel like I have to do so once more here. I think it'd help, not just with this topic but also to build a bit more mutual understanding.

Sorry, I should have replied to that suggestion. While I agree that having this discussion in writing is hard, my biggest concern is that the problem shouldn't be for you and I to come to some agreement, but for a community consensus to be reached. And for that, having some participants have a better level of understanding via an offline discussion can, in my opinion, be as much of a problem as the awkwardness of written discussion (I say that as someone who's frequently unable to participate in face to face discussions, and so is often the one that gets "left out"). So while I'm not dismissing the idea, I'd rather hold off on it unless it becomes clear that I am the only person with concerns (at which point, TBH, I'd probably just defer to the other pip maintainers anyway).

Also, if we can't explain the proposal well in writing, how are we ever going to document it and write error messages, option names, etc., well enough to address the UI issues?

Re-familiarizing myself with the pip code base, opening a PR and then having you likely shoot it down with basically the exact same arguments you've already made seems less attractive though.

Fair enough. Although it's not just me, @dstufft is flagging the same objections - if anything, as the person who suggested the "option taking a TOML filename" approach, I'm more open to the idea than he is.

Looking back at the discussion here to get a feel for the other pip maintainers' views (anyone I mention, please correct me if I've misinterpreted what you said, or you want to add further comments!):

With that summary, I think characterising this as something that I am going to "shoot down in flames" is rather unfair.

I'm inclined to drop the discussion for a while, and wait to see what the other maintainers have to say (if anything!)

rgommers commented 1 year ago

Also, if we can't explain the proposal well in writing, how are we ever going to document it and write error messages, option names, etc., well enough to address the UI issues?

I understand that, and the value of posting a summary of a synchronous call, etc. It just seems like a 30 minute sync would pay itself back in no time compared to the hours it takes to write multiple of these page-long comments.

I'm inclined to drop the discussion for a while, and wait to see what the other maintainers have to say (if anything!)

Okay, let's see what others think then.

I would add only one comment on "ask the backend": to me, that seems worse than having nothing at all. It would be a feature that not just isn't what we want/need for dev env setup and CI needs; if it'd trigger a build of the kind of package I work on (the one with complex build needs), it's something that will likely result in both failed builds and subtler issues due to build isolation - so one more source of support questions.

pfmoore commented 1 year ago

to me, that seems worse than having nothing at all

Even if, as @sbidoul suggested, the backend is only queried if the pyproject.toml doesn't specify static dependencies? The cases where working, but with a query to the build backend, is strictly worse than not working at all, seem very limited to me.

if it'd trigger a build of the kind of package I work on (the one with complex build needs)

You said you've never needed dynamic build requirements, so surely it wouldn't?

rgommers commented 1 year ago

You said you've never needed dynamic build requirements, so surely it wouldn't?

xref the first sentences of https://github.com/pypa/pip/issues/11440#issuecomment-1546857008. It changes the escape hatch from "get a clear error message up front" (which is okay) to "source of new bug reports" (not okay).

Also from a design perspective, something that can go from "cleanly and quickly install some packages" to "possibly may trigger a long build, you don't know until you try" for other packages is not great. I've had so much bad experience with the "may or may not trigger a build" semantics that I'd really really like to not see more of that. There's a reason it a "key issue" all by itself in the pypackaging-native content (link).

dstufft commented 1 year ago

if it'd trigger a build of the kind of package I work on (the one with complex build needs), it's something that will likely result in both failed builds and subtler issues due to build isolation - so one more source of support questions.

This sounds like a feature request for the build backend? Does it not implement prepare_metadata_for_build_wheel?

rgommers commented 1 year ago

This sounds like a feature request for the build backend? Does it not implement prepare_metadata_for_build_wheel?

I don't understand, how can the build backend know whether the frontend is calling that hook as part of a regular wheel build or this special operation which is "the user did not ask for a build but used --only-deps"? The prepare_metadata_for_build_wheel must be called if and only if you're actually preparing a build (that's in the name ...), which this is not.

Or if you're saying: "implement prepare_metadata_for_build_wheel in the backend but always make it error out if and only if metadata is dynamic", then I think that backend then decides for all of its users that they cannot use dynamic dependencies at all - which does not sound right. Or would you go even further and let the build backend make this configurable for package authors? That, I guess, is a feature. It's a pretty complex/cumbersome one though, that may have backwards compatibility impact, and is perhaps not going to be implemented by every backend. So that turns a simple thing into a bit of a mess.

pfmoore commented 1 year ago

There's a reason it a "key issue" all by itself in the pypackaging-native content (link).

Yeah that puts us back in the territory of debating the underlying standards. I don’t want to go there in this case either - if there is a problem with the semantics of the standard, we can fix the standard, but pip has always taken the position that we implement standard-defined behaviour, we don’t innovate on new standards or changes to existing standards. distlib took the approach of being a proving ground for new behaviours, but pip never went down that route.

dstufft commented 1 year ago

I don't understand, how can the build backend know whether the frontend is calling that hook as part of a regular wheel build or this special operation which is "the user did not ask for a build but used --only-deps"? The prepare_metadata_for_build_wheel must be called if and only if you're actually preparing a build (that's in the name ...), which this is not.

PEP 517 does not require that it's only called if you're preparing a build, and in fact that hook was added explicitly to be able to avoid doing a build in cases where you just needed the metadata. It is analogous to the old setup.py egg_info command.

Pip itself uses this in cases where it's not going to build the wheel, for instance with pip download to discover the dependencies of arbitrary sdists, without building the sdist (unless the build backend doesn't implement it, in which case it falls back to build the wheel since that's the only option the build backend gave us).

dstufft commented 1 year ago

I'd also point out the prior art in this space-- @brettcannon mentioned pip-tools, which does support reading from pyproject.toml [^1] as a source of dependencies to compile into a concrete requirements.txt, and it does so by calling prepare_metadata_for_build_wheel through build.util.project_wheel_metadata.

[^1]: It actually supports setup.py and setup.cfg as well, and it supports them all the same, by invoking the prepare_metadata_for_build_wheel.

pfmoore commented 1 year ago

Note: This question is related to the semantics of what "only dependencies" actually means, and is independent of the sub-thread about reading pyproject.toml.

If we look back at the original request, to have a way to install "only the dependencies" of a package A, what is the required semantics in the case of a dependency loop? If A depends on B, and B depends on C and A, is the expectation that only B and C would be installed, even though that means not all of B's dependencies have been installed and the resulting environment will be broken? Since the introduction of the new resolver, pip has taken care not to create broken environments (unlike the legacy resolver, which did so almost as a matter of routine 🙁), so this could be seen as a rather serious regression.

Alternatively, if A does get installed, we're bound to get a lot of confused users who assume that "install only the dependencies of A" means that A won't be installed.

To be explicit, I'd be perfectly happy if circular dependencies were illegal. But they aren't, and we need to deal with them (and not by declaring them "not supported" when we support them fine in other contexts).

uranusjr commented 1 year ago

I have not read the previous discussions between my last comment and latest, but to this point specifically:

were positive about the idea, but haven't commented since the idea of not asking the backend for the dependencies came up. I'd suggest that means they were thinking in terms of getting dependencies "as normal" (i.e., from the backend) but that could just be me projecting my views, so I'd rather let them comment themselves.

This seems to have the same effect of what I mentioned in my previous comment https://github.com/pypa/pip/issues/11440#issuecomment-1541392668 in the how dependency discovery is implemented. IMO having pip install . and pip install --only-deps . discover dependencies differently would be quite confusing, and pip install -r ./pyproject.toml would be a better interface since it provides an entirely different mental model around this new behaviour.

Aeron commented 1 year ago

Why change the existing resolving and discovery mechanics at all? By having pyproject.toml, pip install . works just fine, with no extras; Just as fine as the pip install -r requirements.txt command.

So, as I see it, the main goal is to have the pip install -r requirements.txt behavior for pyproject.toml. (Hold it for a second 😆)

The original post puts it clearly: the A is ., and everything else is a requirements.txt with abstract dependencies virtually. The proposed --only-deps option is the same as --no-package by meaning.

The only problem is the spec placeholder. But it is for optional dependencies in square brackets, not a list of packages. @flying-sheep, please correct me if I see it wrong, and the spec can be a list of packages.

I think it's very important to take a step back and recognise the 2 different use cases

The solution probably could work well for both cases because—in both cases—we lack dependencies for our package. It is a win-win for Docker layers as for stand-alone app local development. There is no contradiction here.

And the ask to have a incomplete broken metadata specification which also lacks a number of features from the requirements file seems woefully misplaced

If the pyproject.toml specification lacks something for narrow use cases, people can always fall back to the not-so-concrete requirements.txt approach. This option doesn’t go anywhere.

There is a whole battery of tools that can handle and sync non package project

Excessive entities for a container image, eh? But why? The pip is ok with --no-deps but too cool for --only-deps? I don’t see how we need a separate tool for installing dependencies of non-library packages.

Let's please keep in mind what pip stands for

What it stands for? Please, do us honor and elaborate a bit. Maybe I misplaced my hopes and expectations on the wrong horse. Perhaps it will end all further discussions at once.

brettcannon commented 1 year ago

Or if you're saying: "implement prepare_metadata_for_build_wheel in the backend but always make it error out if and only if metadata is dynamic", then I think that backend then decides for all of its users that they cannot use dynamic dependencies at all - which does not sound right. Or would you go even further and let the build backend make this configurable for package authors? That, I guess, is a feature. It's a pretty complex/cumbersome one though, that may have backwards compatibility impact, and is perhaps not going to be implemented by every backend. So that turns a simple thing into a bit of a mess.

I guess there could be another API for this sort of thing (whether it's a flag to prepare_metadata_for_build_wheel to say you don't want a wheel at all or a new endpoint entirely), but I do agree that seems like overkill.

If we look back at the original request, to have a way to install "only the dependencies" of a package A, what is the required semantics in the case of a dependency loop? If A depends on B, and B depends on C and A, is the expectation that only B and C would be installed

I would assume A would get pulled in from somewhere else, e.g., PyPI, if such a circular dependency came up (see below for why).

To be explicit, I'd be perfectly happy if circular dependencies were illegal. But they aren't, and we need to deal with them (and not by declaring them "not supported" when we support them fine in other contexts).

IMO having pip install . and pip install --only-deps . discover dependencies differently would be quite confusing, and pip install -r ./pyproject.toml would be a better interface since it provides an entirely different mental model around this new behaviour.

As I said, I think this is where the disconnect in this discussion is coming from. My mental model is what @uranusjr suggests which eliminates the "other contexts" that I think @pfmoore is referencing since requirements files don't have a context of circular dependencies with the current project since there is no "current project" in requirements files.

pfmoore commented 1 year ago

I think this is where the disconnect in this discussion is coming from.

I'm confused, because I've always been open to the idea of some sort of --reqs-from-pyproject flag - I suggested such a thing back here and in that comment I said I'd be OK with essentially the suggestion by @uranusjr of pip install -r ./pyproject.toml. There's no "disconnect" here as far as I can see, just two different proposals (--reqs-from-pyproject and --only-deps) with different sets of trade-offs. Unless, of course, the disconnect is that other participants are not seeing these as different proposals...

From my post here I did my best to summarise what I saw as people's positions over how to install "only the dependencies of a project". In that summary, I noted that I was basically the only maintainer who had said they were open to some sort of "read a list of requirements from a file" mechanism (@uranusjr since said that he'd also supported that idea). But I never suggested the two approaches were the same - they are clearly (to me, at least) different proposals.

My concern for --reqs-from-pyproject specifically is making sure we frame such a feature in a way that doesn't lead users to expect more than it provides, and that we don't further confuse users over terminology and concepts. That's why I want someone to write a PR - it's impossible to discuss the best way of wording documentation, error messages, etc, without a concrete proposal to start from.

I mentioned circular dependencies solely because the --reqs-from-pyproject suggestion didn't seem to be getting anywhere and if we went back to the --only-deps approach I wanted the problem on record. It's worth noting that the original proposal here was for --only-deps[^1], and as far as I can see @flying-sheep has never confirmed that --reqs-from-pyproject would be suitable for the original motivating use case. Hopefully it would, but I think we should do him the courtesy of confirming that before completely changing his proposal...

[^1]: --reqs-from-pyproject was only really introduced in response to your use case, posted here

rgommers commented 1 year ago

I'll add one more set of answers below, but I am kinda running out of bandwidth for this week, so I'll also summarize my point of view:

  1. I'm +1 on some form of --reqs-from-pyproject, which installs dependencies without ever triggering a build. My reading is that this is what quite a few users are asking for explicitly, and it's perfectly well-behaved (a la reading a requirements file)
  2. I'm -0 to -0.5 on --only-deps which does trigger a build. If there's demand from other users (which is not evident) then I'll have to live with it, but I am unlikely to use it and only see it as a potential source of support requests.
  3. The build backend interface does not provide a hook for reading sections of metadata from pyproject.toml as such, and misusing the prepare_metadata_for_build_wheel for that is not okay.

It's worth noting that the original proposal here was for --only-deps1, and as far as I can see @flying-sheep has never confirmed that --reqs-from-pyproject would be suitable for the original motivating use case.

This is not the case. This issue was split off from gh-8049, which was explicitly about pyproject.toml only. Because there was resistance, this new issue was created. But there were never end user requests for --only-deps as far as I can tell. Here is @flying-sheep's first comment: https://github.com/pypa/pip/issues/8049#issuecomment-788784066, also very clearly about pyproject.toml.

I'm confused, because I've always been open to the idea of some sort of --reqs-from-pyproject flag

You closed the original feature request https://github.com/pypa/pip/issues/8049#issuecomment-1481232848 with a comment to that extent less than two months ago, and it's now locked - which is what sent folks here I suspect.

My mental model is what @uranusjr suggests which eliminates the "other contexts" that I think @pfmoore is referencing since requirements files don't have a context of circular dependencies with the current project since there is no "current project" in requirements files.

+1 same here

PEP 517 does not require that it's only called if you're preparing a build, and in fact that hook was added explicitly to be able to avoid doing a build in cases where you just needed the metadata. It is analogous to the old setup.py egg_info command.

I disagree. I see nothing in PEP 517 that suggests it's intended or acceptable to query for metadata more generally and without the intent of building a wheel. And both the hook name and text like _"If the build frontend has previously called prepare_metadata_for_build_wheel and depends on the wheel resulting from this call to have metadata matching this earlier call,"_ are very explicit - they are about actually building a wheel and preparing for that, nothing else. I'm not really interested in what an old setup.py command did, I'm only interested in what the PEP actually says and in correct/idiomatic/intended behavior of hooks like this.

Unless I am missing it and there indeed is text in PEP 517 (or another more recent PEP) about this, I will maintain (from both my perspectives as a build backend author and as a Python package author with heavy build needs) that using this hook to query for metadata only is abusing the hook, and it's up to you or whoever wants to drive this to either get PEP 517 amended or propose a new hook for metadata querying (and that should then be per section).

dstufft commented 1 year ago

That text and name exists because we were worried about the case where people would call the hook prior to a build, then run the build and get different metadata. Arbitrarily using it to get metadata was one of the core ideas behind it when it was being proposed to my memory.

uranusjr commented 1 year ago

Yes, the motivation is similar to 658, to allow an installer to inspect the built metadata to decide whether to build the entire package for installation or not.

brettcannon commented 1 year ago

Arbitrarily using it to get metadata was one of the core ideas behind it when it was being proposed to my memory.

So would the expectation be to always call prepare_metadata_for_build_wheel, or use what's in pyproject.toml if dependencies and optional-dependencies are not listed in dynamic and otherwise fall back on prepare_metadata_for_build_wheel? I'm thinking of the overhead of calling that function, e.g. people not specifying a build system and so prepare_metadata_for_build_wheel would inevitably cause the installation of wheel and setuptools as the default setup -- I'm not even thinking about whether setuptools would create a wheel as well or has the "smarts" to just read pyproject.toml -- which is a lot if you're just wanting what's in project.dependencies. But obviously the logic is simpler if you just say, "call prepare_metadata_for_build_wheel".

P.S. Looks like PEP 517 hasn't made it over to packaging.python.org yet.

dstufft commented 1 year ago

I would think it would be fine to use the data directly from pyproject.toml if you have confidence it would not differ from what you would get if you called prepare_metadata_for_build_wheel, but that prepare_metadata_for_build_wheel is the primary interface for getting that metadata in a build system agnostic way.

pfmoore commented 1 year ago

P.S. Looks like PEP 517 hasn't made it over to packaging.python.org yet.

Nor has PEP 518 - https://packaging.python.org/en/latest/specifications/declaring-build-dependencies/ is just a stub linking to the PEP. I suspect they simply aren't very high on anyone's priority list.

drebbe-intrepid commented 1 year ago

Adding another use case for --only-deps requirement:

I have a rust crate using PyO3 and Maturin where I need only the build dependencies setup and specifically the module in question NOT installed.

Only solution I see moving forward is to maintain a requirements.txt and pyproject.toml and make sure the two requirements are in check (bug prone).

I'm new to pyproject.toml so I'm open to suggestions or something I'm doing wrong.

stefaneidelloth commented 1 year ago

Please support pip install without any extra argument to install requirements defined in pyproject.toml, (similar to npm install), also see #12100 and https://stackoverflow.com/questions/74508024/is-requirements-txt-still-needed-when-using-pyproject-toml

stefanv commented 1 year ago

I need this feature all the time (e.g., on CI systems, local non-isolated builds, etc.), and since I can't figure out a sanctioned way to do it, here's a script:

https://gist.github.com/stefanv/0b052fa4014fa07e18e81fe544afc9f9

Obviously I'm +1 on adding this feature to pip.

brettcannon commented 1 year ago

FYI my motivation around this is a bit more public now: https://discuss.python.org/t/a-look-into-workflow-tools-package-management-in-the-python-extension-for-vs-code/29632 . I would love to get to have VS Code set up beginners to use pyproject.toml when we help them manage their dependencies, but right now I don't feel like we can based on the fact we have to do an implicit editable install which their code might quite likely not be set up for.

bernhardkaindl commented 1 year ago

@brettcannon: I added this in the documentation of my project (use pip-tools) to install the extras from pyproject.toml:

PYTHON=python3.10
EXTRAS=.,test,mypy,pyre,pytype,tox
PIPFLAGS="--no-warn-conflicts --resolver=backtracking"
$PYTHON -m pip install pip-tools
$PYTHON -m piptools compile --extra=$EXTRAS -o - pyproject.toml |
    $PYTHON -m pip install -r /dev/stdin $PIPFLAGS

However, using some pip directly without this workaround build be great!

@pfmoore Would you please consider that pip should support installing extras from pyproject.toml in a user-friendly way?

pradyunsg commented 1 year ago

Would you please consider that pip should support installing extras from pyproject.toml in a user-friendly way?

Uhm... pip install ".[extra]"?

stefanv commented 1 year ago

Only install extras (or dependencies, another use case), not also install the package itself.

pradyunsg commented 1 year ago

Nvm me -- I thought there was something new being requested here. We don't really need more +1s on this issue at this point. :)

And, since it's hidden below the fold, the intended design for this feature is https://github.com/pypa/pip/issues/11440#issuecomment-1445119899.

PS: @bernhardkaindl The pip-compile trick is a neat one.

randolf-scholz commented 10 months ago

Maybe there are situations where you only want to install the optional dependencies without the project dependencies.

I'm not comfortable with that. Indeed extras are additive to the base dependencies by definition, so such a mechanism sounds a bit awkward to me.

How is one supposed to emulate pip install -r requirements-dev.txt via pyproject.toml to install only development dependencies?

sbidoul commented 10 months ago

@randolf-scholz as far as I know there is no interoperability standard that defines such a thing as development dependencies[^1]. At the same time, there is no intention for pip to deprecate requirement files. So at this stage I personally see no compelling reason for pip to invent and implement something new in that area.

I still stand behind the proposal in https://github.com/pypa/pip/issues/11440#issuecomment-1445119899, though.

[^1]: I'm personally happy to declare them as a dev optional dependency, although I understand this approach is somewhat controversial.

astrojuanlu commented 10 months ago

After receiving the notification of new activity in this old issue I re-read the conversation and I think this point by @dstufft is key: https://github.com/pypa/pip/issues/11440#issuecomment-1561436816

That text and name [prepare_metadata_for_build_wheel] exists because we were worried about the case where people would call the hook prior to a build, then run the build and get different metadata. Arbitrarily using it to get metadata was one of the core ideas behind it when it was being proposed to my memory.

pip doesn't want to implement non-standard behavior, which is fair. @sbidoul proposal received some criticisms and --deps-from ./pyproject.toml seemed to be closer to what -r requirements.txt does and thus closer to the pip DX, although it had some implications too.

Even the suggestions of "just make pip read the [project.dependencies] table if it's static" seem to go against the standards as they currently stand now. Even if it's technically possible in 99 % of the cases (heck, this is already trivially possible with https://pypi.org/project/pyproject-metadata/), it's philosophically an abuse of the frontend/backend separation.

Can something be done to amend or supersede PEP 517 then? For example, adding a mandatory hook that extracts the metadata without building, and then make the build process feasible only with such metadata, hence superseding the current prepare_metadata_for_build_wheel. Then pip or a rogue frontend could use that hook to return the dependencies (which, for the case of static dependencies or dynamic dependencies being read from requirements.txt, would be a simple file read) while keeping the frontend free of assumptions about how backends should treat the [project] table. This is what we all want, IIUC?

sbidoul commented 10 months ago

Even the suggestions of "just make pip read the [project.dependencies] table if it's static" seem to go against the standards as they currently stand now.

This is a part I don't understand. I've seen several folks writing something along those lines, also in other threads, but... PEP 621 explicitly says A build back-end MUST honour statically-specified metadata.

So since we have a guarantee that the backend will produce the same metadata, what's wrong for a frontend to rely on static metadata (when it is present) to improve performance of things like pip install --only-deps ?

pradyunsg commented 10 months ago

to go against the standards as they currently stand now.

What is this based on?

My reading of the situation was that it was fully expected that frontends can rely on this, and that this would especially serve as an optimisation in the sdist case -- it'd avoid needing to do an entire round of subprocess calls and shuffling around of files, which is a meaningful optimisation. I won't be surprised if I've missed something around this but I don't see any real reason to have this be disallowed and certainly no reason for this to not be permitted for frontends to do as an optimisation.

pfmoore commented 10 months ago

Possibly on the fact that in the sdist case, the expected approach is to use the actual metadata in the PKG-INFO file (assuming Metadata 2.2) as that's more accurate (specifically, likely to have fewer "dynamic" fields, notably the version). But sure, if PKG-INFO isn't available, or if it uses a Metadata version before 2.2, then reading from pyproject.toml is fine. And starting from pyproject.toml is OK, although it's a pessimisation if you find the field you want is dynamic and then need to go to PKG-INFO anyway.

(Side note: If tools are going to just rely on pyproject.toml and not fall back to PKG-INFO, then I'm not clear on the point of PEP 643 - which was written because at the time, there was pushback[^1] on pyproject.toml being used as the canonical source. Maybe the people who pushed back have since changed their minds, or are not involved in this discussion...)

[^1]: As far as I remember, I specifically wrote PEP 643 in response to that pushback. I haven't tried to find a specific link to support my recollection, as I'm not sure it's particularly relevant any more.

pfmoore commented 10 months ago

I went and found a link - https://discuss.python.org/t/pep-621-round-3/5472/35. PEP 643 was basically a way of saying "where should setuptools put reliable metadata from setup.py if we don't want it to rewrite the user-created pyproject.toml?

Given that setuptools seems to have adopted pyproject.toml fairly successfully since then (faster than PyPI has adopted PEP 643! 🙁) maybe the issue is no longer relevant...

astrojuanlu commented 9 months ago

The performance optimization argument wouldn't be as compelling if there was a way for backends to just generate the metadata (since the slow part is building the wheel). And if we go that path, then frontends wouldn't need to do "read dependencies from pyproject.toml but if they're dynamic then build the wheel" (the "pessimisation" @pfmoore alludes to), which would affect projects using setup.py, projects reading requirements.txt... and probably there are a lot of those. So being able to just generate the metadata would make --only-deps go faster for them.

But of course this hook doesn't exist today, it needs to be specified. I don't have a brilliant track record of sticking to my promises but I'm even willing to write a PEP if there's more or less consensus about this...

sbidoul commented 9 months ago

If tools are going to just rely on pyproject.toml and not fall back to PKG-INFO, then I'm not clear on the point of PEP 643

@pfmoore Both are useful, IMO. When looking at the original source tree (which I understand is the main use case for this issue), we don't have a PKG-INFO. And a project can have dynamic dependencies in pyproject.toml that become static in the sdist PKG-INFO.

But of course this hook doesn't exist today

@astrojuanlu the prepare_metadata_* hooks do exist in PEP 517 and are being implemented by backends. Is that what you are looking for? They are still orders of magnitude slower than direct reading from pyproject.toml or PKG-INFO due to the overhead of subprocess calls, and installing build dependencies in isolated build environments.

pfmoore commented 9 months ago

Both are useful, IMO. When looking at the original source tree (which I understand is the main use case for this issue), we don't have a PKG-INFO.

Let me try to clarify my thinking here.

In all the cases where we have no dependency data (source tree with dynamic dependencies, or no [project] data) then we have no choice other than to call prepare_metadata_for_build_wheel. It's potentially costly, but there is no other viable option to get the information. We could raise an error saying --only-deps is unsupported in that case, but I don't see how that's helpful.

In the case where we have a sdist with metadata 2.2 information, but where the dependencies are marked as dynamic, we might potentially have information in Requires-Dist. But in that case, PEP 643 states "Consumers, however, MUST NOT treat this value as canonical, but MAY use it as an hint about what the final value in a wheel could be" - and I don't think that using the information to choose what to install for --only-deps comes under the heading of "using it as a hint". So again, I don't think that we have a justifiable option other than calling prepare_metadata_for_build_wheel.

I understand that prepare_metadata_for_build_wheel can be costly, and will require the process of calling a build backend hook (with all of the build environment setup costs that involves). But I don't see that there's any reasonable alternative. We either do that or we have no dependency information that we can use (without ignoring what the relevant standards say is permissible).

Having spent the time researching and writing the above, it seems to me that it's all fairly uncontroversial. We get the data, from whichever source is the most likely to give us reliable data, and if we find we can't get reliable values, then we have to ask the backend (which is the final source of truth in this matter).

One thing I know I am ignoring here is that PEP 660 allows editable installs to inject additional dependencies to support the "editability" of the installation. But I think it's reasonable to say that --only-deps ignores such additional dependencies (it's a UI question as to whether it's better to disallow the use of --editable with --no-deps, or to simply state that --editable will be ignored - and I have no opinion on that).

Am I missing something? What's the alternative that people are debating about? Or is it just that until now no-one bothered to go through and work out how this would be implemented in detail? (I was definitely guilty of that until now, so if that is the only problem, then my apologies for being part of the issue here...)

sbidoul commented 9 months ago

Thanks for the detailed analysis @pfmoore. This perfectly matches my understanding of the matter.

pfmoore commented 9 months ago

Cool. So where does that leave us - are we simply waiting at this point for someone to implement the feature based on this analysis? I think this approach addresses any reservations I had about the feature.

Note that I don't think this addresses --only-build-deps, and I definitely think we should consider that idea separately. It's fundamentally not possible to get the correct set of build dependencies without (1) querying the backend, and (2) knowing what we want to build (as get_requires_for_build_wheel, get_requires_for_build_sdist, and get_requires_for_build_editable are all separate backend calls). So I think we should be very cautious about offering an option that claims to install the "build dependencies" when we know we cannot actually do that... (actually, I'm personally a strong -1 on --only-build-deps, but I'm willing to hear arguments as to why I'm wrong...)