pypa / build

A simple, correct Python build frontend
https://build.pypa.io
MIT License
720 stars 117 forks source link

Listing filenames of produced distributions in the CLI #198

Open layday opened 3 years ago

layday commented 3 years ago

Source distribution and wheel filenames are variable; they encode a variety of information (e.g. platform, Python version, distribution version) which might vary between build invocations. We might want to think about offering a way to retrieve these from the CLI (e.g. through a new option which would create a manifest) for scripts and automation tools to refer to which would provide a minor convenience over globbing.

FFY00 commented 3 years ago

IMO this is out of scope. Maybe if this information wasn't in the file name I would agree on having a manifest with extra information.

I think it is fairly simple to write a Python module that just parses the wheel name and output json or something like that.

import argparse
import json
import os.path
import re

from typing import Any, Dict, Optional

_WHEEL_NAME_REGEX = re.compile(
    r'(?P<distribution>.+)-(?P<version>.+)'
    r'(-(?P<build_tag>.+))?-(?P<python_tag>.+)'
    r'-(?P<abi_tag>.+)-(?P<platform_tag>.+).whl'
)

def parse(name: str) -> Optional[Dict[str, str]]:
    if m := re.match(_WHEEL_NAME_REGEX, name):
        return m.groupdict()
    return None

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'file',
        type=str,
        help='wheel file',
    )
    args = parser.parse_args()

    if info := parse(os.path.basename(args.file)):
        print(json.dumps(info, indent=4, sort_keys=True))
    else:
        print('Inavlid wheel name, see https://www.python.org/dev/peps/pep-0427/#file-name-convention')
python -m wheel2json ~/Downloads/packaging-20.7-py2.py3-none-any.whl
{
    "abi_tag": "none",
    "build_tag": null,
    "distribution": "packaging",
    "platform_tag": "any",
    "python_tag": "py2.py3",
    "version": "20.7"
}
FFY00 commented 3 years ago

Perhaps something for https://github.com/pypa/wheel? python -m wheel info ~/Downloads/packaging-20.7-py2.py3-none-any.whl?

layday commented 3 years ago

I think you're misunderstanding - I don't want to parse the wheel filename into its constituent tags. I want build simply to return the filename of the distribution file the backend has produced in a machine-readable format. It could then be piped into a hypothetical installer or another tool which might expect an sdist or a wheel. For instance, if we were to produce a JSON file:

$ python -m build --write-manifest-to manifest.json
$ python -m install $(jq .wheel manifest.json)
FFY00 commented 3 years ago

Hum, I still am not convinced. I would like to keep the CLI fairly simple. As you said

would provide a minor convenience over globbing

Unless there is a reasonable use-case this would either block or make significantly harder, I am -1.

gaborbernat commented 3 years ago

I think this could be achieved if we print the generated files on stdout and forward other output to sys.stderr:

twine upload $(python -m build 2>/dev/null)

Though, in general, I'm tempted not to support this. I can see a valid use case for this in the sense that currently if you want to build a package and then upload it, you need something like:

rm dist -r && python -m build . && twine upload dist/*

Because otherwise, dist might contain previous builds you don't want to upload via a simple globbing. I'm solid -1 on the JSON part, but I'm -0.5 on the stderr/stdout part... 🤔

FFY00 commented 3 years ago

Because otherwise, dist might contain previous builds you don't want to upload via a simple globbing.

But you can simply choose other dist folder. If this was not possible I would agree with this feature, but you can... I think it is trivial to output to another folder if you need to separate things, and then move the files afterward if you want to have everything in the same folder at the end.

gaborbernat commented 3 years ago

How do you select a folder name that's guaranteed to be empty/not existing without invoking an rm first?

kpfleming commented 3 years ago

As I've been working on a project with @gaborbernat I've run into this exact problem as well. The only reason that this works properly today for twine is because twine apparently does its own glob expansion.

In a CI job I'm running python -m build --sdist --wheel with no prior knowledge of what the generated file names will be, and then I want to do an install-test of those packages. python -m pip install foo/* does not do glob expansion, so it is necessary to either know the filenames of the sdist/wheel, or to use external glob expansion to get their names.

As best I can tell PEP 517 requires the build backend to return the basename of the thing it built, and ProjectBuilder in this tool returns the full path to the thing that got built. The requested information is already available, so would a change to emit it on request be a significant effort?

uranusjr commented 3 years ago

FWIW if you know the package name in advance, you can do pip install --find-links foo <project> instead.

gaborbernat commented 3 years ago

Calling pip just to find the names IMHO is the way too heavyweight a solution.

uranusjr commented 3 years ago

I was responding to kpfleming’s comment, which was talking about actually installing the built wheels. If you know the project names in advance, you can pass those to pip to install the packages (instead of using path, which needs either glob expansion or knowing the dynamically-generated file names).

I made no mention of using pip just to find out the names. pip cannot even do that (without you manually parsing its debug output).

FFY00 commented 3 years ago

Okay, I see the need, though I don't think adding a new option for the might be the best solution. python -m build is a command for users, it is designed to be fairly simple and to be as intuitive as we can. Building from an automated script, without user interaction, is a different use-case. I think adding a new option here would increase the complexity of the command line, and still be not enough to solve the kind of issues that may arise from this use-case, putting more pressure on us to add more options and further make the CLI more complex. For this reason, I believe this use-case should be handled by a different command (maybe python -m build.machine?). The idea would be that this new command would be able to output the build information that may be necessary by automated tooling. Usually, I'd say to just use the Python module to write your customized payload, but that could be annoying, and given how common this use-case is, having a ready to use suitable interface would make sense. Having this as a separate command also opens the possibility to later split it into a different package if it becomes increasingly complex, or starts needing external dependencies. How I'd propose this command to behave is to simply output json based on defined json schema. It would then have the option to only output a specific field instead.

So, the usage in this case would be something like:

$ python -m build.machine --output-field build.artifacts
dist/my_package-1.0.0.tar.gz
dist/my_package-1.0.0-py3-none-any.whl

TLDR: Keep python -m build a simple user-focused command and introduce a new command for use in automated scripts and that could appropriately address the requirements of that more complex use case.

gaborbernat commented 3 years ago

TLDR: Keep python -m build a simple user-focused command and introduce a new command for use in automated scripts and that could appropriately address the requirements of that more complex use case.

I'm personally -1 on this proposal. Would confuse more than help to have to maintain and use two separate entry-points depending on your use case. But considering twine accepts glob expressions I'm personally not too fussed about this at the moment, so I have no strong feelings of a solution, and I feel introducing two entry-points is more confusing/pricey than its benefit...

PS. Your proposal is also against UNIX design philosophies, I haven't this duality in other tools, for example, there is not an ls and an ls.machine my 2c.

layday commented 3 years ago

Perhaps this is something that we could roll into #192 - if the output format were to be customisable - say, if build could grow a provisional --output-format=(human|json) option, and all non-build output redirected to stderr as suggested, that'd probably meet users' needs. Imagine:

$ python -m build -w 2>&-
Built foo.whl
$ python -m build -w --output-format=json 2>&-
{"type": "build_success", "path": "foo.whl"}

This would mean we've got cook up some kind of JSON schema and we'd need to think about whether this would be more generally useful - would people care about other type messages being given in a machine-readable format?

uranusjr commented 3 years ago

say, if build could grow a provisional --output-format=(human|json) option, and all non-build output redirected to stderr as suggested, that'd probably meet users' needs.

I was going to suggest something similar as well; if build is going to do this, it should have a global flag similar to Git’s --porcelain=.

Another solution to this would be offer a programmatic API that matches exactly one-to-one to the command line, that either returns or passes to hooks structured data containing relevant information. The stdlib venv.EnvBuild is an example; its init parameters map exactly to python -m venv arguments, and the context argument in various hooks contain information to the environment being created. This way, people looking for machine-readable output can write a Python script with that API and “bring their own serialisation” for data exchange.

FFY00 commented 3 years ago

I think I am okay with going with @uranusjr proposal of a programmatic API, though it is not the cleanest solution for this. It would still be useful on its own.

What about a python -m build.json command that will behave just like python -m build but will output json? There we could have all the required options to curate the data output.

I would really like to keep this separate for the main command for two reasons, 1) it makes the command much simpler, and 2) it becomes easy to separate the command to another package if needed. I would like to keep things simple and fairly modularized given that this is a critical package to bootstrap Python environments, I want to be able to easily drop functionality, especially as runtime requirements, if we run into any issues.

henryiii commented 3 years ago

How about having --output-format=default (or similar), and if you pass something else, like --output-format=json, then it looks up a entrypoint? Then build-json could provide a build.output-format:json entry-point. Or you could even include it in the same package, but that would still make it easier to pull it out or have people write more.

I like the idea of a programmatic API, though it's a bit of work to document, it would be nice to have (and sometimes cleans up the internals a little).

What about python -m build.json

I don't like this, personally. First, what do you do with pyproject-build? pyproject-build.json? Second, this is much harder to use with pipx run, which is a fantastic way to run build, especially in CI - you'd be forced into a --spec build pyproject-build.json (or whatever it was called). Third, it's not discoverable. python -m build -h won't naturally show this an an option. Finally, it's not a different command, just a different output option; if it was a different command, then you'd have to duplicate all the options, like --wheel, etc. That's not good API design, generally; it's not composable and is turning something that is fundamentally an "option" into a command. Now if the json form had a completely different set of options, this would be better/correct. But not if it just changes the output.

If it's only json, then json's pretty easy to handle with stdlib utils, so I don't think it would hurt bootstrapping.

gwerbin commented 2 years ago

Sorry to bump an old thread, but here's another use case: generating the name of the Sdist and Wheel files before actually building them (when possible, e.g. when a dynamic setup.py is not present), for later use in a Makefile.

henryiii commented 2 years ago

This can't be done before, as it's up to the build backend to decide the outputs. Build can't know if the build backend is going to produce a pure Python wheel, a compiled wheel, or something in-between (like one that doesn't depend on the Python version but does depend on the OS).

layday commented 4 months ago

I've been writing built dist filenames to a JSON file in the dist folder, but that ends up being picked up by twine with dist/*, which attempts to upload it to PyPI and fails. Prepending a dot to the filename would work on *nix, but that would also make it less discoverable and I assume would have no effect on Windows. If we do decide to write filenames to a file next to the dists, we might have to coordinate excluding it from twine.

henryiii commented 4 months ago

What I think would work would be a --json-output= (bike shedding fine) option that would write out the filename (and a bit more info if anything makes sense) to a file that the user specifies. That way it doesn't have to be in dist unless the user wants it there, you could save multiple runs, etc. For tools like cibuildwheel, this could be a temporary dir.

This can't be done with stdout / stderr since the build backend is allowed to write there, and the info is important / useful. uv has this same problem, and works around it by writing to a file.

layday commented 4 months ago

How would I reproduce this in uv? It doesn't expose a build command.

henryiii commented 4 months ago

When uv is building (like for uv pip install or the experimental build command (requires a dev build), it needs to communicate between processes. Stdout/stderr can't be used, so uv moved to using a temporary output file to commutate between the Python and Rust processes. See https://github.com/astral-sh/uv/pull/2314

layday commented 4 months ago

Ah, but that’s for IPC with the build backend. pyproject-hooks works exactly the same way.