pypa / build

A simple, correct Python build frontend
https://build.pypa.io
MIT License
746 stars 122 forks source link

Provide function to get dependencies of package without building #181

Closed mauvilsa closed 3 years ago

mauvilsa commented 4 years ago

There are cases in which it is needed to get the requirements of a package without having to build it, see for example github's dependabot #2281. There was a discussion in https://github.com/pypa/pep517/issues/100 about this and it seems that a good place to provide this functionality is here in build. I open this and close the discussion in pep517.

FFY00 commented 4 years ago

https://github.com/pypa/build/blob/5d3f89b6f75086b6280d97c68c6d0149bdc6c7e7/src/build/__init__.py#L192

Do you need separation between build dependencies and project dependencies?

graingert commented 4 years ago

without building

If a build system doesn't provide prepare_metadata_for_build_wheel then you have to call build_wheel

mauvilsa commented 4 years ago

Do you need separation between build dependencies and project dependencies?

This is up for discussion. People have been using pep517.meta.load for this, so it could be just that exact behavior. But I guess something else that supports creating a dependency graph would do.

gaborbernat commented 4 years ago

Do you need separation between build dependencies and project dependencies?

We're not talking about build/backend dependencies here, but dependencies for the actual package. The low-cost option if the backend supports is to call https://www.python.org/dev/peps/pep-0517/#prepare-metadata-for-build-wheel and pass that to the dist-info parser, alternatively, we need to build a wheel, extract the dist-info, and pass that on.

mauvilsa commented 4 years ago

If a build system doesn't provide prepare_metadata_for_build_wheel then you have to call build_wheel

Yes I forgot. There should be a fallback so that in this case the package is built in order to get its dependencies.

FFY00 commented 4 years ago

Ah, sorry. We can provide prepare_metadata_for_build_wheel but I think anything other than that, like parsing the metadata, is out of scope for this project.

gaborbernat commented 4 years ago

We can provide prepare_metadata_for_build_wheel but I think anything other than that, like parsing the metadata, is out of scope for this project.

I agree that perhaps parsing is out of scope, though we're really only talking about a stdlib import:

from importlib.metadata import Distribution
dist = Distribution.at("/path/to/metadata.dist-info")

However, I don't agree that we need prepare_metadata_for_build_wheel, because that's not a valid path for a backend without that hook. IMHO we should instead provide a method of:

build.get_dist_info(path)

That calls prepare_metadata_for_build_wheel if backend has, otherwise builds wheel, extracts dist-info into the provided path. So then users can do:

from importlib.metadata import Distribution

with ProjectBuilder() as builder:
     with TemporaryDirectory() as tmp_dir: 
          dist = Distribution.at(builder.get_dist_info(tmp_dir))

for req in dist.requires:
    print(req)
graingert commented 4 years ago

I think this is a duplicate of https://github.com/pypa/build/issues/130

graingert commented 4 years ago

It's important to understand the context of this PR. It's a hook for GitHub Dependency Graph.

prepare_metadata_for_build_wheel is currently optional, however, providing high-level access to the metadata without falling back to build_wheel will effectively make prepare_metadata_for_build_wheel mandatory for all projects on GitHub.

Of course, the obvious work-around will be that PEP517 build-systems will be forced to provide their own prepare_metadata_for_build_wheel that calls their build_wheel hook internally anyway.

gaborbernat commented 4 years ago

It's important to understand the context of this PR. It's a hook for GitHub Dependency Graph.

I don't think that's relevant. It's one use case, but any IDE/tool that is interested in finding metadata about a python project is just as much a valid use case. What's more the Github team already made it clear that they don't have any plans to do the switch anytime soon (even if we provide a way).

I think this is a duplicate of #130

It's not. As you explained this issue goes beyond the scope of just exposing access to the prepare_metadata_for_build_wheel hook.

Of course, the obvious work-around will be that PEP517 build-systems will be forced to provide their own prepare_metadata_for_build_wheel that calls their build_wheel hook internally anyway.

Doing this dance is spelled out exactly as the responsibility of any frontend per PEP-517 (e.g. this tool). Read the final sentence at https://www.python.org/dev/peps/pep-0517/#prepare-metadata-for-build-wheel.

wwuck commented 3 years ago

It's important to understand the context of this PR. It's a hook for GitHub Dependency Graph.

I can also see a good use case of this for a project like https://github.com/Arkq/flake8-requirements so it can get dependencies of a project using a single common source, regardless of whether the project uses setuptools, poetry, flit, PEP621, etc.

layday commented 3 years ago

Can this be closed on account of #301?

graingert commented 3 years ago

Could a helper function be created that abstracts the tmpdir creation?

FFY00 commented 3 years ago

@jaraco suggested the same. We are considering adding a plumbing module, which could hold these kind of helper functions.

layday commented 3 years ago

There's been talk about creating a helper module which would abstract common operations, but until then, creating a tmpdir manually shouldn't be too bad.

layday commented 3 years ago

Doh, we replied at the same time 😄

wwuck commented 3 years ago

Any updates on this helper module? Or is https://github.com/pypa/build/issues/301#issuecomment-863276790 still the recommended approach for now?

If we are using that snippet, is it possible to silence the extra printing to stdout/stderr?

koobs commented 3 years ago

tldr: The ability to instrument a packages dependencies, pre build (and post build), is critical and that signal is still being lost among the 'packaging noise'.

Downstreams, FreeBSD in my case, but many others, including all downstream OS packagers, have had a critical and long time need for a consistent and single point for dependency discovery. This has, over recent years, been complicated by the flurry of new, alternative and (orthogonally) different ways to declare dependencies, including, but not limited to:

1) setup.py _requires 2) setup.cfg: 3) pbr, poetry, et al 4) reading requirements*.txt files from arbitrary locations into (1), (2) and (3) (ie: pip) 5) ...

build presents one of the first opportunities in a long time, given its 'build' plugin/backend design, explicit specification requirements that backends can/should/must support, and being driven by PyPA, for the pain associated with easily, quickly and consistently discovering dependencies to be effectively banished (for now, at least).

Throughout the "packaging dark ages" and to now, almost everyone has grokked at least some understanding of the complexities of dependencies, in particular their dynamic nature (setup vs run time, etc), because these are always the kinds of first responses and conversations that take place.

While we all mostly have that understanding, it must also be understood and made clear that requests for these capabilities is not "we need exact, complete and perfect dependency discovery for every case, in all stages, all the time, including for cases that are hard or not even in principle possible", it is "the Python ecosystem and its entire community needs the best dependency discovery that we can muster", with clear descriptions of their limitations and gaps, when and where they apply.

If this means at first, only being able to discover what a package statically declares for itself (install_requires, setup_requires, setup, build_requires, tests_require extras_require, whatever), that's a great start. We can then have the conversation about mechanics to refine and improve that discovery/instrumentation, in light of the dynamic (pre build / post build) changes to them.

The absolute key needs are:

gaborbernat commented 3 years ago

IMHO this has been done, mostly as indicated by @wwuck above. You can load the project dependencies with the following snippet (there's no guarantee it's not going to trigger a build though, that's just not possible).

import tempfile
from importlib.metadata import PathDistribution
from pathlib import Path

from packaging.requirements import Requirement
from build import ProjectBuilder

builder = ProjectBuilder('.')

with tempfile.TemporaryDirectory() as tmpdir:
    distribution = PathDistribution(Path(builder.metadata_path(tmpdir)))
    requires = [Requirement(i) for i in distribution.requires]
    build_requires = builder.build_system_requires  # also use get_requires_for_build for sdist/wheel build deps

We don't offer a way to get requires wrapped in Requirement directly, because we do not want to add dependency on packaging. Otherwise it will list dependencies, with version specifiers and markers if applicable. You can apply your own custom groupping, and is not build tool specific.

install_requires, setup_requires, setup, build_requires, tests_require extras_require, whatever

Out of these only install requires, build requires and extras require is a standardized thing. The rest is not and we cannot offer it in any form.

wwuck commented 3 years ago

@gaborbernat Thanks for pointing that out. I ran a quick test with your snippet against a fresh clone of https://github.com/pypa/sampleproject

import tempfile
from importlib.metadata import PathDistribution
from pathlib import Path

from packaging.requirements import Requirement
from build import ProjectBuilder

builder = ProjectBuilder('.')

with tempfile.TemporaryDirectory() as tmpdir:
    distribution = PathDistribution(Path(builder.metadata_path(tmpdir)))
    requires = [Requirement(i) for i in distribution.requires]
    for x in requires:
        print(x)

It will print out the package dependencies (hooray!), but it also prints out extra build debugging information which I can't see how to disable. How can we disable this extra output to stdout/stderr? It's not really feasible to use that code snippet while there is no control over the stdout/stderr output.

running dist_info
creating /tmp/tmp08sqx9di/sampleproject.egg-info
writing /tmp/tmp08sqx9di/sampleproject.egg-info/PKG-INFO
writing dependency_links to /tmp/tmp08sqx9di/sampleproject.egg-info/dependency_links.txt
writing entry points to /tmp/tmp08sqx9di/sampleproject.egg-info/entry_points.txt
writing requirements to /tmp/tmp08sqx9di/sampleproject.egg-info/requires.txt
writing top-level names to /tmp/tmp08sqx9di/sampleproject.egg-info/top_level.txt
writing manifest file '/tmp/tmp08sqx9di/sampleproject.egg-info/SOURCES.txt'
adding license file 'LICENSE.txt' (matched pattern 'LICENSE.txt')
reading manifest file '/tmp/tmp08sqx9di/sampleproject.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file '/tmp/tmp08sqx9di/sampleproject.egg-info/SOURCES.txt'
creating '/tmp/tmp08sqx9di/sampleproject.dist-info'
adding license file "LICENSE.txt" (matched pattern "LICENSE.txt")
peppercorn
check-manifest; extra == "dev"
coverage; extra == "test"

It appears to be builder.metadata_path(tmpdir) that is outputting the extra stuff, but the documentation for that method doesn't mention anything about stdout/stderr.

FFY00 commented 3 years ago

Set the runner to pep517.quiet_subprocess_runner when constructing ProjectBuilder.

See https://github.com/FFY00/python-resolver/blob/fb377c7bb7342ae1671390743c3afcf06765aea7/resolver/mindeps/__main__.py#L46-L50

wwuck commented 3 years ago

@FFY00 Thanks for that. This means we also have to import from pep517 package? The README for pep517 mentions that parts of it are deprecated. Can we rely on pep517.wrappers.quiet_subprocess_runner staying around and not being deprecated? Will that be moved into pypa/build at some point?

FFY00 commented 3 years ago

Yes, though I would probably import it from pep517 directly (pep517.quiet_subprocess_runner), I think that would be the most stable in the long run. The parts that are being deprecated are the high-level helpers, which have been made redundant by the API we provide in this project.

wwuck commented 3 years ago

Thanks. Now that we have a solution to this, is it possible to get it added as a utility function to pypa/build? Maybe something like build.package_dependencies(srcdir: str)? Or if that is deemed not necessary, can we get this documented somewhere on https://packaging.python.org/? It seems to be a common enough question that documenting it somewhere will stop people from having to search through a bunch of random github issues to find it.

gaborbernat commented 3 years ago

I'm +0.1 on it, so guess other maintainers will have to chime in.

FFY00 commented 3 years ago

We have discussed having a helper (plumbing) module previously, and I am fine with it. Such high-level functions, like the one you propose, should be separated from the core API and must be in a separate module. I want to be able to easily split that into a separate package if the maintaining cost gets too high, or we need more dependencies.

I am thinking of making a release soon, we can work on this after that.

FFY00 commented 3 years ago

It is now available in 0.7.0.

https://pypa-build.readthedocs.io/en/latest/api.html#build.util.project_wheel_metadata

FFY00 commented 3 years ago

I am closing this, as getting the metadata without building is not possible in several situations, and we now provide the closest helper to that that we can.