pypa / build

A simple, correct Python build frontend
https://build.pypa.io
MIT License
701 stars 115 forks source link

Building c-extension wheels for multiple python versions #268

Open gaborbernat opened 3 years ago

gaborbernat commented 3 years ago

Assume you have a c-extension project (something with cython), that builds a wheel only for the current python major/minor python version. Currently building wheels for all supported python versions (e.g. CPython 3.6, 3.7, 3.8, 3.9 is very tedious with the build project). You have to create four virtual environments, install the build four times and invoke it four times (make sure your build executable names pick up the right host python version). I propose we allow a mode to build these set of wheels in a single command. Something like:

python -m build --wheel py36,py37,py38,py39 --sdist .

The good news is implementing this is simple with what we have, if we make this use case require the virtualenv extra. All we have to do is to invoke the build in a loop and pass on these python specification strings to virtualenv here. I think it's alright to support non-host python targets with the virtualenv extra only (essentially making the default --wheel py).

@FFY00 @layday @henryiii thoughts? If we agree I can do the implementation.

henryiii commented 3 years ago

Building proper wheels is tricky; I'm afraid this would get misused. If someone wants multiple wheels they likely are going to try to distribute them, and then they should probably use cibuildwheel, which handles building in a manylinux container (on linux), using a macOS 10.9 compatible Python (macOS), dealocating / audit wheel, universal wheels, testing, etc. pypa/build is currently being integrated into cibuildwheel as an option; it was recently added to manylinux for that purpose.

gaborbernat commented 3 years ago

Building proper wheels is tricky; I'm afraid this would get misused.

Can you please list proper examples of such worries? I don't want to talk about hypothetical.

should probably use cibuildwheel

In lots of places (e.g. behind enterprise firewalls), this is just not possible. Let's not consider the cibuildwheel the solution, is a solution for some but I don't think generalizes correctly. If you go by that logic we should remove the --wheel flag because building wheels is tricky to get right. My use case is something someone can do today, I just was hoping we can make it less painful.

henryiii commented 3 years ago

Can you please list proper examples of such worries?

I thought I did? Manylinux needs to be built inside the manylinux container, macOS needs the official Python downloads to support 10.9+ (or generally anything older than the current macOS version, which is what most other Pythons are built for), cross compiling for Apple Silicon/Universal2 requires various workarounds, delocating/auditing for distribution, etc.

If we remove the --wheel flag, then what about Pure Python wheels, which is currently probably the main use of pypa/build? What about using this in a well prepared environment (such as cibuildwheel or multibuild, or your company environment, or your current environment for your own use, or your package manager environment)? All of these are perfectly valid uses of --wheel.

If we support python -m build --wheel py36,py37,py38,py39, I can easily see users expecting this to produce binary wheels ready to be uploaded to PyPI. This will not be the case on macOS or Linux (though on Windows it might work pretty well). It's not build's job to handle all the trickery involved in building wheels to be distributed.

This also bothers me because it detaches the Python version from the version of Python running build, relying on a search procedure (from virtualenv, I assume). When building a binary wheel, which Python can be important; "any" copy of Python x.y might not do (esp on macOS).

Just to be clear, I'm not strongly opposed, I'm just voicing my concerns that this feature might cause more harm than good. "just passing through" these to virtualenv seems pretty simple. It is a little irritating that this would only work if you activate the virtualenv extra; this extra is also not available in manylinux (except for manylinux1 Python 2.7), which is arguably one of the most important places to have this feature. Also nothing is installed into the default Python in manylinux, making this a bit weird to use.

henryiii commented 3 years ago

If implemented, I think I would rather something explicit and without a loop, like --virtualenv=py36 or something like that; putting the loop inside build is not really necessary, I think, and might help with my concern of this being too easy to use for the wrong purpose. Also is more obvious it requires virtualenv, and more explicit.

gaborbernat commented 3 years ago

I thought I did? Manylinux needs to be built inside the manylinux container, macOS needs the official Python downloads to support 10.9+ (or generally anything older than the current macOS version, which is what most other Pythons are built for), cross compiling for Apple Silicon/Universal2 requires various workarounds, delocating/auditing for distribution, etc.

You're assuming here that people want to build manylinux wheels. Often you don't need them or are straight up a bad idea (when you can ensure that your target deploy environment will contain the dynamically linked libraries it's actually preferable to not bundle them). Often you build wheels to target your current OS and that's it. I don't view build project as the project to use to build wheels to upload to PyPi. build is a project to build sdist/wheel, a frontend to PEP-517. If you want manylinux/deallocated wheels then you probably still need build to build the base wheel and then customize it. But again, build is not a PyPi wheel builder, it's a generic PEP-517 frontend, and should not worry about PyPi.

If we support python -m build --wheel py36,py37,py38,py39, I can easily see users expecting this to produce binary wheels ready to be uploaded to PyPI.

I don't think this is related to this topic. Users can already assume today that --wheel will build a PyPi ready to upload wheel even if it's a c-extension wheel, which clearly does not. At no point, we check if the wheel is not a pure python wheel and raise otherwise to shield users from hurting themselves. So while I understand your worries, those worries are a thing today too, and this feature would not make that any better or worse.

This also bothers me because it detaches the Python version from the version of Python running build, relying on a search procedure (from virtualenv, I assume). When building a binary wheel, which Python can be important; "any" copy of Python x.y might not do (esp on macOS).

If you bother to read the documentation I've linked you'll see you have very strict control what python you're building your wheel (from specifying just major python version all the way down to passing the explicit path),

henryiii commented 3 years ago

Often you build wheels to target your current OS and that's it

Yes, and this exactly is what you get today with --wheel. But you are trying to add a way to target a bunch of Python versions easily, which makes it look like it builds general wheels and not "current environment" wheels.

If you bother to read the documentation I've linked

I did, though I didn't see the explicit path part.

What about my suggestion that we not integrate the loop, but have an explicit --virtualenv=<spec> flag?

Again, I'm not totally against this (I'm a fan of a similar, but non-looped version of this in pipx), but I have concerns I'm voicing.

gaborbernat commented 3 years ago

What about my suggestion that we not integrate the loop, but have an explicit --virtualenv=<spec> flag?

I'm happy with that part.

Actually, I realized one can already do this today via environment variables, but that doesn't translate that nicely to usage within tox:

env VIRTUALENV_PYTHON=py38 python -m build .
env VIRTUALENV_PYTHON=py37 python -m build .

Yes, and this exactly is what you get today with --wheel. But you are trying to add a way to target a bunch of Python versions easily, which makes it look like it builds general wheels and not "current environment" wheels.

Ignore cibuildwheel, which is unusable in some environments that can't use those CI environments it supports. The first step to generate auditwheels is to build a wheel, and I think build should be used for that.

FFY00 commented 3 years ago

Sorry for the short reply, it is late and I am on my phone. I am remodeling at home, so my setup will be dismantled until Wednesday, I should still be able to jump on my laptop if needed.

I do not agree with this interface, I think #104 is the correct way forward. We can make it so that the interpreter argument can receive multiple values. The only practical difference from what you propose is that the wheels and sdist will have to be built in different commands, which I think it's fine.

python -m build --sdist
python -m build --wheel --python 3.6,3.7,3.8,3.8
gaborbernat commented 3 years ago

I see no actual proposal in #104, can you clarify what you meant?

henryiii commented 3 years ago

Ignore cibuildwheel, which is unusable in some environments that can't use those CI environments it supports. The first step to generate auditwheels is to build a wheel, and I think build should be used for that

This was not related to cibuildwheel. My point is making a list of Python versions looks like you are building for general Python versions, which you are not with build; build just builds a Python wheel with an existing environment, which I think is much more clear when you list it explicitly and run build once per wheel. Also, which environment does the sdist build in? It shouldn't matter (though I can customize my sdist command to store information in the SDist about what Python it came from ;) ), but this logically either has to do something with the SDist; if you don't make it a loop, then it's explicit and up to the programmer.

I really like the environment variable method, actually. It seems odd to add a command line argument that only works when virtualenv is installed; I'm guessing most of the time it is not installed, since it is not needed on Python 3. You can control other aspects of the build via PIP_ environment variables, too, so it's consistent with what we currently have, say for PIP_ONLY_BINARY. An environment variable also works in the current version of build. :)

doesn't translate that nicely to usage within tox

Isn't this tox's fault then, not build? ;)

I do not agree with this interface, I think #104 is the correct way forward.

I'm also not sure what's different in that vs. the proposal here? I think @gaborbernat's suggestion of using the virtualenv python specification strings is necessary; you have to be able to select 32 bit vs. 64 bit Python, PyPy, sometimes an exact path, etc. A simple number is not enough for all cases (though it reduces to exactly the same number in the simplest case, so I don't think this is actually any different that the proposal above). (This is also part of why I don't like looping - you are not building a "3.8" wheel, you are building a wheel with a specific Python, which is 32 or 64 bit, which has some minimum version of macOS support, etc. One at a time specifications make this clearer, IMO.)

If it's the name, --virtualenv vs. --python, that would be open for discussion; --virtualenv helps highlight it's only available if you have the virtualenv extra installed, and that the specification comes from virtualenv. --python clearly highlights what you are doing, selecting the base Python interpreter. However, I currently like the environment variable; it works today, it's clearly virtualenv-only, and it's a standard name from virtualenv itself, just like the pip settings.

henryiii commented 3 years ago

Keep in mind, manylinux will not support this, as it does not have virtualenv installed, and it does not touch the "base" python, so you need to be using one of the specific Python interpreters anyway. And this will not work correctly for redistributable wheels on macOS unless someone manually installs all the official CPython downloads. So it's usefulness is mostly limited to specific environments and maybe Windows. And if someone is targeting a specific environment, they will very likely be in said environment, not building for lots of Pythons at once.

It's really easy to loop on the command line, or manually list N lines for N interpreters (N is pretty small).

python -m build --sdist .
env VIRTUALENV_PYTHON=py3.8-32 python -m build --wheel .
env VIRTUALENV_PYTHON=py3.8-64 python -m build --wheel .
env VIRTUALENV_PYTHON=py3.7-32 python -m build --wheel .
env VIRTUALENV_PYTHON=py3.7-64 python -m build --wheel .
env VIRTUALENV_PYTHON=pypy3.7 python -m build --wheel .
henryiii commented 3 years ago

Ignore cibuildwheel, which is unusable in some environments that can't use those CI environments it supports

Final quick comment, not related to the discussion here but addressing the comment above since I'm involved in cibuildwheel and can't let it slide by: cibuildwheel does not "support specific CI environments", it has a list of "tested" ci environments. You can (and it is) run on custom company machines, etc. You can even run it locally, though if you target macOS or Windows, you have to be okay with it installing official, global copies of Python on your system, making it better for "ci-like" deployments. It's perfectly okay to target linux on any system due to the use of docker, though. :) You should think of cibuildwheel more like a special tox or nox that handles Python-for-wheel installing and is customized for building. It is not a collection of random CI scripts that just works on CI. It's a regular Python package.

It was just an example above, though. The same holds true for multibuild, the azure template based solutions, or any other system. Build should be seen as a piece of a pipeline (building wheels), and not try to take on replacing these tools, because there are a lot of components to getting it right, building for multiple versions of Python in a loop is just one of them. To users, often this is seen as the most important one, but there's really a lot more to it.

gaborbernat commented 3 years ago

Isn't this tox's fault then, not build? ;)

It's highly unusual for tools to provide env-var only interface.

cibuildwheel does not "support specific CI environments", it has a list of "tested" ci environments. You can (and it is) run on custom company machines, etc

As said above cibuildwheel is an orthogonal project to build --wheel command (they have different use cases and scenarios when you want to use them), so I consider the entire thread on this off-topic. Let's not dwell any further on it. As said there's plenty of use cases when you absolutely don't want auditwheel/dellocate/etc.

layday commented 3 years ago

This seems like a minor gain in circumstances where "cross-compilation" is feasible. For manylinux, it offers no actual benefit - the build dependencies are installed under each Python version available.

We cannot reuse --wheel for this because it's a flag.

henryiii commented 3 years ago

There actually is a general place for "tools", pyproject-build could go there, actually. It might not a good idea, because you want to build with specific interpreters, and not accidentally from the "tools" environment (then you get the same Python 3.7 wheel each time), but it could be done. Just noticed this, thought I should clarify.

(in manylinux, 2010+ anyway)

gaborbernat commented 3 years ago

This seems like a minor gain in circumstances where "cross-compilation" is feasible. For manylinux, it offers no actual benefit - the build dependencies are installed under each Python version available.

Why is it a minor gain to not have to install build project n times in n virtual environments?

We cannot reuse --wheel for this because it's a flag.

Turning a flag into an option taking arguments is backwards compatible. So I think we can.

There actually is a general place for "tools", pyproject-build could go there, actually. It might not a good idea, because you want to build with specific interpreters, and not accidentally from the "tools" environment (then you get the same Python 3.7 wheel each time), but it could be done. Just noticed this, thought I should clarify.

Against what I asked, you keep dragging back manylinux into this discussion, even though it's not related to this request at all. For one final time, there are many circumstances when you don't want manylinux wheels, but you still want wheels.

henryiii commented 3 years ago

Against what I asked, you keep dragging back manylinux into this discussion

No, you asked not to talk about cibuildwheel. I fully understand that there are places to use this other than broadly redistributable cases, but building redistributable wheels is a huge part of binary wheel building and I don't think we should totally ignore it if there's a way to use it. My point was I thought this would be useless in manylinux, but I was wrong, there is a way we could integrate this if we really wanted to (though likely would be best to keep it to the current usage in manylinux).

Turning a flag into an option taking arguments is backwards compatible

Not that I'm aware of with argparse.

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('file')
parser.add_argument('--wheel', nargs='?', const=True, default=None)

args = parser.parse_args()
print(args)
$ python3 tmp.py --wheel .
usage: tmp.py [-h] [--wheel [WHEEL]] file
tmp.py: error: the following arguments are required: file

CLI11 can do a back solve and support this, but to the best of my knowledge, argparse cannot. And this is a really weird argument, since it doesn't work unless you install virtualenv. Other than on Python 2, this is not the normal way to install, but an extra.

Why is it a minor gain to not have to install build project n times in n virtual environments?

I think it's a major gain to be able to target another environment, but if that's what this issue is about, it is a duplicate of #104 with a little more detail in the proposal. If it's about the interface, --wheel is not backward compatible. --target would be better, or --python, etc. Honestly, since this only works if virtualenv is present, simply documenting the environment variable might cover this usage. And finally, if it's about looping, I'm mildly against that; most cases where you want to target a non-redistributable environment, you don't need to loop that many times. And you can't use a , for the loop character because you can't use a , for an argument that could contain a filename, and this could be a filename. And looping in the shell or listing a few lines is not that hard, while this would be a new behavior to learn and extra complications inside build, like the interaction with --sdist.

henryiii commented 3 years ago

PS: my favorite name for this would be --virtualenv, as that makes it very clear it requires virtualenv to work. It's the virtualenv target python. Otherwise we will get a lot of questions as to why this doesn't work for most people who just install build.

gaborbernat commented 3 years ago

Not that I'm aware of with argparse.

import argparse

parser = argparse.ArgumentParser()
def a(value):
    return value.split(',')
parser.add_argument('--wheel', nargs='?', default=[], type=a)

print(parser.parse_args([]))
print(parser.parse_args(['--wheel']))
print(parser.parse_args(['--wheel', 'a']))
print(parser.parse_args(['--wheel', 'a,b']))
Namespace(wheel=[])
Namespace(wheel=None)
Namespace(wheel=['a'])
Namespace(wheel=['a', 'b'])

. Honestly, since this only works if virtualenv is present, simply documenting the environment variable might cover this usage.

I view #104 more of a invoke python 2 env from 3, but I suppose can be. 104 has no suggestion for solving it yet, so I guess you can consider this a suggested solution to 104.

The fact we use virtualenv is an implementation detail. One day perhaps venv will also allow this and then choosing virtualenv would be bad. Also maybe we make virtualenv a mandated dependency, and again naming it virtualenv would be bad. We can raise a human-readable error when virtualenv is not present to tell users what underlying thing they need to make it clear to the users.

layday commented 3 years ago

And what happens when --wheel is followed by srcdir?

python -m build --wheel .
gaborbernat commented 3 years ago

And what happens when --wheel is followed by srcdir?

That's indeed unfortunate. 🤷🏻 Guess scrap making it part of the --wheel flag, we can have it then as --wheel-python or something. I'm ok with making it non-looping (though ideally would prefer looping, but not too picky about that part).

henryiii commented 3 years ago

I am perfectly aware you can split a string with a comma. I am also aware that a comma is a perfectly valid character in a filename on some systems (not Windows, IIRC). We use commas in our paths in HEP all the time (not my choice). How would you pass /usr/my,dir/python to this without it trying to split it?

My problem with argparse is you can't back solve, so python -m build --wheel <something> feeds <something> to wheel, rather than recognizing that there is no srcdir and giving <something> to srcdir and leaving this an argumentless flag. I've implemented this before in CLI11, so I know it's possible, I just don't think argparse does it.

gaborbernat commented 3 years ago

I am perfectly aware you can split a string with a comma. I am also aware that a comma is a perfectly valid character in a filename on some systems (not Windows, IIRC). We use commas in our paths in HEP all the time (not my choice). How would you pass /usr/my,dir/python to this without it trying to split it?

My bad you'd use os.pathsep as separator instead. But if we make it a new flag you can just allow multiple enumerations: --wheel-python a --wheel-python b. I'm ok with --wheel-python that defaults to sys.executable (current behavior).

henryiii commented 3 years ago

python -m build wheel-python 3.7 --wheel-python 3.8 . is not that much better than python -m build wheel-python 3.7 . && python -m build --wheel-python 3.8 ., while the later is general and does't require any special docs, etc. And you can always enable looping later, after this has shipped initially, that's a backward compatible addition.

I think I like --wheel --target 3.7 best I think, or --wheel --python 3.7, combining the already odd (#249) but useful --wheel and --sdist default interaction with yet another option that can replace the default --wheel --sdist seems non-ideal; but then again, if you are targeting a Python environment, you would rarely be building the SDist too, so not strongly against it either. Pretty close to 50/50, I think.

gaborbernat commented 3 years ago

python -m build wheel-python 3.7 --wheel-python 3.8 . is not that much better than python -m build --wheel-python 3.7 . && python -m build --wheel-python 3.8 ., while the latter is general

I'd argue my form is much better and much more general. Your suggestion is a bash-ism which IMHO we should not assume. Users are free to use whatever shell they like, and we should not assume a bash-like shell (while having a cross-shell compatible invocation form). Your proposed form is not guaranteed to work in all shells. The latter does not work if you're doing subprocess.call, and while you could do two subprocess.call on Windows creating a process is expensive (relatively speaking compared to UNIX). I personally like --target because tells nothing about how it's used. Maybe target python? Also, would --target-python be respected for sdist too? If not, we should add a wheel in the name.

henryiii commented 3 years ago

Using os.pathsep would also not be cross-platform, by the way. I'd actually just make it multiple calls rather than anding them together:

python -m build --sdist
python -m build --wheel --target 3.7 . 
python -m build --wheel --target 3.8 .

on Windows creating a process is expensive

You are worried about a subprocess call cost when you are building a compiled extension? This has to be multiple orders of magnitude, even for the simplest possible extension. In some shells, you could even run this in parallel pretty easily.

--target-python be respected for sdist too?

If not a loop, yes. You can build an SDist in a target Python environment. If looped, probably not (which one would it build with? Technically should build each time if looped, but will likely produce the same file each time. A wheel will too, though, if it has no extensions, so maybe this is just a normal generalization and is just fine?).

gaborbernat commented 3 years ago

You are worried about a subprocess call cost when you are building a compiled extension?

There are other scenarios too where you'd want different targets, not just c-extensions. For example, imagine a pure python lib that vendors its dependencies during the build. The dependencies to vendor would depend on the target python environment. Good solutions generally degrade gracefully, so yeah in the simple situations would not add overhead, unless must. Anyways, I'm happy with no loops for now and something like:

python -m build --wheel --wheel-target-python 3.7 .

Where --wheel-target-python would default to sys.executable and thus becoming what we do today.

henryiii commented 3 years ago

Being able to target a different Python may be useful for the SDist in some cases: #269 for example.

FFY00 commented 3 years ago

Sorry for the delay.

I will clarify my comment. I think this is a valid use case, though necessary, so I am okay with supporting it as long as we can come up with a fairly simple CLI interface.

I don't think your proposed CLI interface is great. We already want to support external Python interpreters, see #104. So, we can design that CLI interface in a way that also works for this use case, instead of tying it to the --wheel argument.

My proposal: Have a --python argument which specifies the Python interpreter to use (resolves #104), but make it capable of receiving multiple values. An example run of building a sdist and Python 3.6, 3.7, 3.8 and 3.9 wheels.

python -m build --sdist
python -m build --wheel --python 3.6,3.7,3.8,3.9

Maybe we don't even have the requirement of doing this in just 1 call, if so, simply resolving #104 would work here.

FYI: I have not read the full thread, I am still a bit busy as I need to catch up with the work of the last few days.

FFY00 commented 3 years ago

Getting back to this, I think allowing --python to receive a list may introduce some complexity. So, I'd like to understand the use-case where it would be wanted/needed.

If there isn't anything in particular justifying that, I think it's pretty safe to say that we should go with the following.

python -m build --sdist
python -m build --wheel --python 3.6
python -m build --wheel --python 3.7
python -m build --wheel --python 3.8
python -m build --wheel --python 3.9

This should be fairly easy to automate anyway.

dHannasch commented 2 years ago

A note for anyone else who comes here searching for this functionality (building wheels for multiple Python versions, without Docker): it's possible to hack this together using tox, pyenv, tox-pyenv-install, and tox-wheel. It's complicated to set up for any given package, but it's built into the gitlab-ci-yml branch of the cookiecutter-pylibrary, so you can set up a package to do this using the cookiecutter.

https://gitlab.com/library-cookiecutters/python-nameless/-/jobs/2232489975/artifacts/browse/dist/

dist/nameless-0.1.dev50-cp310-cp310-linux_x86_64.whl
dist/nameless-0.1.dev50-cp36-cp36m-linux_x86_64.whl

This is in a very rough state at the moment; in particular, it uses a hacked branch of tox-wheel that breaks tox-wheel's core functionality (and I'm not immediately sure how to not do that), so be aware if you use tox-wheel for anything else. You can see the mess that's currently needed to make this work at https://gitlab.com/pythonpackagesalpine/python-extensions-alpine/-/blob/tox-wheel-builder-alpine/Dockerfile#L23.

tox-pyenv-install will actually install a Python version on the fly if it's missing.

Of course, just like everything discussed here, this can't build macOS wheels or such; there's no way to do that without containers. It can only build wheels for different Python versions on the same kind of operating system.

(Why the seemingly-random involvement of tox? Tying wheel-building to tox ensures that the list of Python versions wheels are built for is always the same as the list of Python versions tests are run for. There might be a better way.)

agirault commented 1 year ago

@henryiii @FFY00 this is an old issue: is your design recommendation (--python) still under consideration? Is there a timeline for this, or - if this won't happen - could you explain why and should the issue be closed? Ty