python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.9k stars 2.28k forks source link

Add build-system.requires to lock file #8752

Open adisbladis opened 12 months ago

adisbladis commented 12 months ago

Feature Request

Problem

A notable thing missing from poetry.lock are build-systems used to build from sdists.

This is useful in efforts like poetry2nix that uses Poetry metadata. Nix build environments are constructed in a much stricter way where we cannot dynamically install/load build-systems. Currently users of this tool have to resort to manually adding overrides into Nix code to work around this missing metadata.

This issue has been reported many times over the years in various forms:

And has generally been a large source of maintenance that could be completely automated away.

Adding this metadata also has other uses such as building tools that can statically inspect the build-time dependency graph.

Solution

One more possible entries in poetry.lock:

We only need to know build-system.requires & up front when constructing the build environment. There is no need to add build-system.backend or build-system.backend-path.

The PyPi API does not have this metadata. It would need to be extracted from the sdist.

Note

Do note that this is not a suggestion to lock build systems, merely to add the contents of build-system.requires.

Secrus commented 12 months ago

Basically, you want us to put a copy of [build-system] in poetry.lock aside from pyproject.toml? We don't lock build requires deps, so that doesn't belong in poetry.lock.

dimbleby commented 12 months ago

additionally poetry mostly does not care about build requirements when building the lock file - because all of the metadata that it needs is available either directly from a server API, or in an already-built wheel.

ie obtaining this information would hurt every other user: by forcing poetry to download and examine sdists during locking, which it otherwise only rarely needs to do

blaggacao commented 12 months ago

(Somewhat innocently) quoting @Secrus

Specifying build dependencies is a concern for project maintainers, not end users.

Wouldn't the lock file be a project maintainer's truest (even: only) source of the metadata she requires?

Sort of looking at it this way:


Of course, the argument goes on the premise of purely source based distributions, cause that's what we do for a better software supply chain, right? :smile:


The reason this Issue isn't asking for a full lock (trying to make the proposal more palatable) builds on the assumption and peculiarities of a global build system registry that can be overlayed onto the build closure and simply referenced by name, as is the case with Nix.

dimbleby commented 12 months ago

There is some possible ambiguity here so let's be sure we're on the same page.

I don't think you are asking for build requirements from the "current" project's pyproject.toml to be duplicated into poetry.lock: that would obviously be pointless, because you can instead simply read them directly from pyproject.toml - right?

Rather you are asking for all of the build requirements of all of the dependencies of the current project to be copied into poetry.lock - yes?

As I say, this would at the very least require poetry to do a whole pile of work - and downloading - during locking that it otherwise has no interest in doing. Poetry just does not need this information in its lockfile: whenever poetry wants the build requirements from an sdist, it already has the sdist in hand and can read them directly.

I don't know why this isn't also true for poetry2nix but - in the nicest possible way - that sounds like a poetry2nix problem.

Perhaps you would benefit from build requirements being added to the package metadata in general: if so, you could start a conversation with the PyPA and try to get that agreed.

adisbladis commented 12 months ago

I don't think you are asking for build requirements from the "current" project's pyproject.toml to be duplicated into poetry.lock: that would obviously be pointless, because you can instead simply read them directly from pyproject.toml - right?

That is exactly what I'm asking for. Reading pyproject.toml from dependencies is not possible without first downloading the sdist, this makes it impossible to statically reason about the build graph.

Rather you are asking for all of the build requirements of all of the dependencies of the current project to be copied into poetry.lock - yes?

While this would of course also be great, I don't want to conflate these two closely related but distinct issues.

dimbleby commented 12 months ago

Really?

Well this isnt going to happen then, because it is obviously pointless!

poetry.lock is already insufficient without pyproject.toml for all sorts of reasons. What are the circumstances in which you have poetry.lock for a project but not the corresponding pyproject.toml? How does this fit with the second sentence of the poetry2nix readme?

It does so by parsing pyproject.toml and poetry.lock ...

In short: just parse pyproject.toml in the first place!

blaggacao commented 12 months ago

I don't know why this isn't also true for poetry2nix but - in the nicest possible way - that sounds like a poetry2nix problem.

While this seems to be the case, poetry2nix — being a build toolchain — just consumes the maintainer's specced out metadata.

If the spec is incomplete, that unsurprisingly causes the issues described on the initial post above.

Perhaps you would benefit from build requirements being added to the package metadata in general: if so, you could start a conversation with the PyPA and try to get that agreed.

That sounds actually strategically reasonable. We'd be left to discuss wether poetry may be a stepping stone in that strategy in some capacity alongside pdm. I don't know enough to see what's overall most practical / pareto efficient.

adisbladis commented 12 months ago

What are the circumstances in which you have poetry.lock for a project but not the corresponding pyproject.toml?

Here is the case stated as simply as possible: You want to build a project. This project contains pyproject.toml & poetry.lock. We can read those just fine.

We can't recurse into dependencies pyproject.toml files or any other source files to reason about their build time dependencies statically as that requires network side effects.

blaggacao commented 12 months ago

In short: just parse pyproject.toml in the first place!

It boils down to the question of "who should make the inventory?" - the locking tool or the build system?

Some (novel) build systems are so side-effect free that they only can consume an already materialized inventory during planning the build (as @adisbladis mentions: no effectful recursion onto the network possible — mostly it's part of the security model). Nix is one such build system.


A good illustration would probably be as follows. You hop on a train and you're allowed to take with you the list of (locked) sources that you need to build the thing up to the (locked) sources for glibc, or even a manually crafted bootstrapping assembler blob. The train starts and you're in the sandbox, no phone calls no nothing. Just the sources and the boostrapper. You start building your c libraries, then python, and so forth until you finally build the target package. It's a long train ride, one may say so.

You gotta know which sources for which build tool to take with you.

Reconstructing metadata in a preprocessing step is possible, but it's a git operation, because you gotta assert your source list with a lock from TOFU into the commit log before you may board.

It's a major practical hurdle, as reported initially. This weighs against a principled burdening the onus of the inventory on the locking tool.


The reason this request excludes the actual locking is to make it easier (?) to implement on the poetry side. Nixpkgs, in particular, holds a registry of such tools in it's inventory, so by just knowing it's name and being allowed to take (a commit of choice of) Nixpkgs onto the train, you're good. You'll then use the tool version specced in that commit of Nixpkgs.

dimbleby commented 12 months ago

When I tried to clarify whether you cared about the build requirements of the current project or of its dependencies, you said that requirements of the current project was what you wanted, and definitely not build requirements of the current project's own dependencies.

When I said that this was pointless, you said that the challenge was that "we can't recurse into dependencies pyproject.toml files or any other source files to reason about their build time dependencies..."

So I still don't understand the ask. Do you want the lockfile to contain build requirements only of the current project, or of all of its dependencies?

However, I suspect we are only deciding what reason to give for not doing this. It is either pointless or impractical, depending which you want.

blaggacao commented 12 months ago

Do you want the lockfile to contain build requirements only of the current project, or of all of its dependencies?

Clearly there was a misunderstanding. I almost saw this coming:

The ask is for the lock file to contain the build requirements of dependencies (by name) only.

However, I suspect we are only deciding what reason to give for not doing this. It is either pointless or impractical, depending which you want.

Impractical, in this case, in a different reference frame just turns out to be a local maximum.

I wonder if there can be a route forward to the tune of? (Nobody seemed to have an opinion on that venue, yet)

Perhaps you would benefit from build requirements being added to the package metadata in general: if so, you could start a conversation with the PyPA and try to get that agreed.

That sounds actually strategically reasonable. We'd be left to discuss wether poetry may be a stepping stone in that strategy in some capacity alongside pdm. I don't know enough to see what's overall most practical / pareto efficient.

dimbleby commented 12 months ago

I cannot make sense of that last update.

I note that pdm maintainer is taking a similar view - https://github.com/pdm-project/pdm/issues/2465#issuecomment-1845063798 - that this would "tremendously affect locking performance" and is better handled by PEPs / ecosystem support.

Which is pretty much where I am too: poetry is not going to cripple its performance just to add data to the lockfile that it does not need.

blaggacao commented 12 months ago

Yep, I agree that addressing the challenge from this small-iterative-steps angle isn't going to go anywhere.

With the performance penalty, interests just seem not to be really aligning well in this particular instance.

The resources for addressing it via PEPs or any other lengthy ecosystem process may just not be there at this point.

However, full source reproducibility may eventually become a trending interest also for the python ecosystem. :shrug:

dimbleby commented 4 months ago

Also, duplicate #8216, #8261