Open Eric-Arellano opened 3 years ago
I'm interested in working on this support. Has anyone started on it yet? I'd hate to duplicate work here βΊοΈ
Hi! Thanks for looking into this! No one has started work on this. I can give some guidance once you've familiarized yourself with the code.
@benjyw thoughts on the user API for this? Should it be an option whether we generate setup.py vs pyproject.toml, or always do pyproject.toml?
At a minimum, we should generate pyproject.toml that says the build backend is setup.py. That's the path of least resistance: we don't need to change any of our setup.py code, including the plugin hook for users to add custom values. We only need to additionally generate pyproject.toml
π Thanks for your comments!
In terms of the user API, this is a little bit tricky I think. I think the build backend should at least be configurable in some way; the project for which I want to use this feature has flit as the backend. Whether this is done on the user side by pre-configured selection (pdm
, flit
, setuptools
, hatch
, poetry
all spring to mind as options), by users pasting their desired build-system
table into the BUILD
file, or whether this is even generated at all (see the next paragraph) is an important question to resolve, I think.
Aside from the user API in BUILD
files, I think there are some unresolved questions for the implementation. Projects are quite likely to have existing pyproject.toml
files that have information beyond what used to be stored in setup.py
and setup.cfg
. In that case, they'll only want some information added by pants. For example, black
and pytest
configuration could be pre-existing in the pyproject.toml
files. So I think it's likely that fully overwriting the pyproject.toml
in the isolated build environment is not a great idea.
In addition, to the extent that PEP 621 project.dependencies
and project.optional-dependencies
are contained in the pyproject.toml
instead of in requirements.txt
, I think for the most part users aren't going to want to have pants change that too much. The one exception (which happens to be my use case π ) is a monorepo with multiple, interdependent, packages. In this case, it'd be super useful to have pants add the dependency dynamically, but that might be a down the road idea.
The last thing I can think of are the other tables in PEP 621, like entry points and scripts. Tools like flit require these to be in the project
table in pyproject.toml
, whereas setuptools
(if I understand correctly) has either partial support or dual support for setup.py
kwargs.
Anyhow, my first plan is to add PEP 621 support for python_requirements
specifications, as discussed in https://github.com/pantsbuild/pants/issues/16289. I think that implementation can help inform some of these decisions as well.
Ah, that is interesting. So before proceeding it would be good to understand the underlying need.
[I should mention (in case it wasn't clear) that Pants already supports flit and any other PEP-517 backend, with user-authored config. But it will only generate config for you in the form of setup.py
, for the legacy setuptools
backend.]
So am I right in assuming that the need has to do with mixing generated and handwritten config? You want to use flit, and maybe you already have handwritten config for it, but you don't want to check in and maintain the parts of the config that Pants already knows about and is capable of generating? (e.g., 3rdparty requirements, figuring out which firstparty sources go in which distribution, etc.)
I ask because if Pants is generating 100% of the config, then it seems, naively, less important to require one PEP-517 backend over another? Since in that case the user isn't interacting with the backend anyway?
[I should mention (in case it wasn't clear) that Pants already supports flit and any other PEP-517 backend, with user-authored config. But it will only generate config for you in the form of
setup.py
, for the legacysetuptools
backend.]
Yep, I've got it working with flit π I had to disable the setup.py
generation, since I didn't want that file, but that was fine.
So am I right in assuming that the need has to do with mixing generated and handwritten config? You want to use flit, and maybe you already have handwritten config for it, but you don't want to check in and maintain the parts of the config that Pants already knows about and is capable of generating? (e.g., 3rdparty requirements, figuring out which firstparty sources go in which distribution, etc.)
I ask because if Pants is generating 100% of the config, then it seems, naively, less important to require one PEP-517 backend over another? Since in that case the user isn't interacting with the backend anyway?
That's a good reframing of the question, thanks! I agree that if pants is generating 100% of the config, then the user can be agnostic to the specific choice of backend. Some users will want to control that anyways, but if pants can support mixing generated and handwritten config, that should be fine.
That said, I think that there is always going to be some mix of generated and handwritten config anyways, because if you specify [project.dependencies]
to handle 3rdparty requirements, then ideally I think that would go in a pyproject.toml
that's checked in to the repo (and read into pants via #16289). So, except in the very simplest of cases, I don't think it will be possible to avoid mixing generated and handwritten config.
Yeah, that makes total sense. In practice people end up writing plugins to customize setup.py generation, and that's a bit heavyweight for my liking.
So I think what we actually maybe want is templating? We already use janky templating for generating setup.py: https://github.com/pantsbuild/pants/blob/f6b2832539e26ddffb19a7d665b9cef697abd0da/src/python/pants/backend/python/goals/setup_py.py#L623
So a good approach might be to generalize that.
So we can provide some standard templates, but you can configure them if you want to modify any static config, and we can figure out a simplified plugin mechanism for adding dynamic ones.
Then you could, e.g., use your existing config as starting points for templates
Hi @benjyw, I was finally able to get some time to work on this. I have a working although very simple/dumb pyproject.toml
generator that basically copies the existing setup.py
generator as you suggested. Happy to push a branch/make a PR (to my fork or the upstream here) if it's helpful!
I do have a question about the existing setup.py
support. Setuptools, at least as of 61.0.0, fully supports PEP 621 in pyproject.toml
files, with beta support for other setuptools-specific metadata (like packages
, zip_safe
, etc.). That version was released in March of 2022.
So my question is, does it make sense at this point to drop support for generating the setup.py
file at all and move everything into pyproject.toml
? pants
can use setuptools
as the default backend for some consistency with previous behavior. I imagine there will be a long deprecation period for setup.py
generation, as people will have written plugins to customize behavior as you noted.
This is also a scope question, because adding simple support for generating pyproject.toml
files is rather easy (I did it in a day or two, most of which was spent trying to track down the rule graph for the package
goal for my own understanding). On the other hand, thinking about how to consistently support packaging workflows seems like a much bigger question... or maybe I'm just overthinking it π
My second question is regarding a templating approach more generally. Presuming that people use pyproject.toml
to store things like dependencies
and more, how can I correctly merge information from pyproject.toml
and what might be specified as kwargs to python_artifact
? It seems like there's already a lot of code to correctly manage this for the existing setup.py
support, so I don't want to break anything π
Hmm, I don't think we'd want to deprecate setup.py generation. Or at least not until we had a lot of experience with the alternatives. And also, I think that what I'm suggesting re templating would obviate that question.
Here's roughly what I had in mind:
Right now, there is a lot of code that generates data for a setup.py, and then a relatively small amount of code that combines that data with some user-provided data on the target, and then actually jams that combined data into a setup.py, via a janky template.
What I'm proposing is that we keep all that code that generates data, but factor out the setup.py-generating part, and replace it with a layer that takes that data and knows how to generate a few different things from it, using "proper" templates.
Then, we'd provide a handful of useful templates (pyproject.toml, setup.py to start with), but users can override those. We can use a simple, standard template language like mustache.
So now users have two easy ways to affect the output: they can modify the template, and they can add data in the target. When a field is set on a target, it would override (or extend, where relevant) the generated value for that field.
Does that make sense?
What I'm proposing is that we keep all that code that generates data, but factor out the setup.py-generating part, and replace it with a layer that takes that data and knows how to generate a few different things from it, using "proper" templates.
At a high level, this makes sense. At the level of implementation, less so π Let me try and ~talk~ write through some of the details, so I can see if it makes sense.
Right now, generate_setup_py
fills in the string SETUP_BOILERPLATE
template using str.format
based on the kwarg
values calculated in determine_finalized_setup_kwargs
(which itself calls a few other functions to find things like first and 3rd party dependencies, package sources, and such). Whether or not the setup.py
file is generated or copied from the working directory is determined by the generate_setup
kwarg to python_distribution()
.
What you're proposing is to add a pair of new fields (?) to the python_distribution()
target (? or the python_artifact
?), something like generate_build_config
and build_config_template
. The former kwarg, if True
, fills in a template using kwargs from the python_artifact
alias and the computed kwargs from determine_finalized_setup_kwargs
. Then, build_config_template
is a string which is either setup_py
, pyproject
, or the path to a user-provided template file (or a resource
/file
target?). If build_config_template
is setup_py
, the existing logic to generate setup.py
runs. If build_config_template
is pyproject
, the kwarg names are normalized to their PEP 621 equivalents and filled in to a pyproject.toml
file which is written to the chroot for packaging. In the final case, the user-provided template is parsed so that pants can update any relevant values (adding first-party dependencies comes to mind), but otherwise the file is written out as-is to the chroot.
Edit to add: I suppose that the default template should be from an existing pyproject.toml
file, if pyproject
is configured? That'd be what I'd expect to happen, anyways.
Then, generate_build_config
and generate_setup
are mutually exclusive, with the former defaulting to False
, possibly changing to default True
in the future.
Does that track what you're thinking?
Yes, that is pretty much exactly what I had in mind!
Possibly we don't need generate_build_config
at all: if build_config_template
is set then we use it, if not, we don't.
Also, we wouldn't need to parse the user-provided template. I was thinking that it would be a standard mustache template, and so would the built-in templates. So all we'd do is take the "finalized setup kwargs" (under a more generic name) and let the template reference them via mustache {{...}}
placeholders.
Or we can use jinja2 instead of mustache, if we want to get fancy. But we already use mustache in the pants repo, and it's very simple, so it's not a bad place to start.
I think we would always normalize the kwargs to their PEP-621 names, and just change how we do the setup.py case a bit, to use those names. So instead of the lazy str.format thing we do today, we'd properly slot each field into its appropriate setup.py place.
Or, we can provide a special shortcut field whose value is "all the other fields, serialized as a python dict", and another that is "all the other fields, serialized as YAML", because those might be useful, rather than having to template this field by field in the simple case.
Started writing in the thread; but posting my message here instead. My assumptions are/were essentially:
pyproject.toml
in my repositoryThen as a user I'd expect/want a generated package to be built approximately as follows:
It'd maybe even make sense to put most PEP621 metadata in my BUILD file; and have that also generated. I don't think any tools should depend on PEP621 outside of building, and in my experience they don't. I also don't have enough experience with different PEP517 backends to note any major differences that I care about.
Thanks @tgolsson! That's super helpful and more-or-less matches my user expectation as well. I have one question, with regard to this point:
- Pants adds build-system block following PEP518 to select a PEP517 backend
Do you care what backend is selected? If you did, how would you want to tell pants about that? The concern I have is that each backend has a different way to be configured, and pants shouldn't (IMO) need to store all that data (plus what happens for a new backend, etc.).
What @benjyw is suggesting is that if you have a pyproject.toml
file in your repository, it will be treated as a template. So the user would provide something like:
[project]
name = "{{name}}"
version = "{{version}}"
dependencies = ["requests"]
In that vein, you (the user) could add, if you wanted to select a backend:
[build-system]
requires = ["flit_core"]
build-backend = "flit_core.buildapi"
[tool.flit.module]
name = {{packages[0]}} # because flit only allows one package
where pants fills in all the templated variables in double-moustache braces, including modifying the dependencies
array as necessary to include detected first-party dependencies. Would that work? I think for historical reasons, the setuptools
backend would be the one that pants would use, if the user didn't want to specify one.
Do you care what backend is selected? If you did, how would you want to tell pants about that? The concern I have is that each backend has a different way to be configured, and pants shouldn't (IMO) need to store all that data (plus what happens for a new backend, etc.).
I do not care, no. I always pick whatever is suggested by the tool either way. So far I haven't found any major difference in how different backends work - I'd expect the supported feature set to be quite close. It makes more sense for me to optimize for a specific one, and maybe generalize later if someone has a use-case that isn't covered.
I think an unfortunate potential side-effect of templating might be that it doesn't parse as valid toml, too? That'd make other tools cranky which read their config from pyproject.toml.
Edit: Another way of solving that though is to export pyproject.toml into virtualenvs along with tool blocks too. That sounds more like a stretch goal but awesome consistency.
To clarify, I think we'd want to configure which template to use, but I suppose we can default to the "closest" (in a filesystem sense) pyproject.toml/setup.py/setup.cfg we find.
The "a template isn't valid yaml" issue is indeed a problem.
Maybe templating is overkill, now that I think of it. Since almost everything other than setuptools uses pyproject.toml, a mapping of "data field to toml field to write that data to" would suffice, and pants could cleanly augment an existing pyproject.toml without it having to contain placeholders. And we can deal with setup.py specially.
I agree about the valid TOML problem as well... This would actually be a problem for pants itself, if one wanted to use the pyproject.toml
file to store dependencies and read them in as python_requirements
...
To clarify, I think we'd want to configure which template to use
IMO, pants should only support setup.py
and pyproject.toml
as "we will generate this for you" options, so there should be a minimum of configuration, regardless of template or not. I think generating pyproject.toml
and setup.py
are actually mutually exclusive, so the machinery in place to generate setup.py
can mostly stay as it is, while new code is added to generate pyproject.toml that uses the existing code where it makes sense (reading setup kwargs, for example). (See the asides below for more discussion)
but I suppose we can default to the "closest" (in a filesystem sense) pyproject.toml/setup.py/setup.cfg we find.
Would it be possible/make sense to have the user specify a resource
target if they want to add to an existing pyproject.toml
file? Otherwise pants writes out a completely configured pyproject.toml
? I suspect the most common case is that people will want to store at least dependencies
in the relevant pyproject.toml
file to read them into pants via #16932 (although maybe lots of people centralize their dependencies?), so they'll already have the file in their repo...
a mapping of "data field to toml field to write that data to" would suffice, and pants could cleanly augment an existing pyproject.toml
Famous last words, but since PEP 621 has named all the relevant fields, this kind of mapping shouldn't be too hard, and the fields in pyproject.toml
should be fairly straightforward to augment from data in BUILD files. I think the harder part will be figuring out how to handle the PEP 517 backend configuration, if it's required (for example, to specify the import name if it's different from the distribution name).
I think the way to handle that case is:
pyproject.toml
file, default to setuptools
and tell pants how to configure the setuptools build backend, documented here: https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#setuptools-specific-configurationThat way, pyproject.toml
remains valid TOML syntax, but pants doesn't have to learn how to configure every possible backend. We could even document/link to the documentation for several backends to demonstrate how to do 2.
Yeah, this all sounds about right to me!
Random shower thought: Could a lot of the issues be sidestepped by making pants also be the backend, or at least wrap a backend? Then it's less of a text-management problem and more about building the wheel etc (which could be delegated).
Random shower thought: Could a lot of the issues be sidestepped by making pants also be the backend, or at least wrap a backend? Then it's less of a text-management problem and more about building the wheel etc (which could be delegated).
I don't know if we'd want Pants to implement a backend, but we could treat the backend it choses to use as an internal implementation detail. But I think the challenge would remain of how to let users provide data, merge it with Pants-generated data, and feed that into a backend?
Yes, the problem is the same. It just takes control of more of the stack - the pants build backend could be opinionated enough to use Pants mechanisms for finding files, deps, etc, as opposed to reading it from the pyproject.toml. So all it has to do is ingest the [project]
group, and require e.g. dynamic = ['dependencies']
and whatever other fields that Pants should own completely.
The Python packaging ecosystem has been working hard to bring more standardization, such that you don't "have" to use pip and setuptools.
PEPs 517 and 621 are particularly relevant for metadata for packages. Pants should generate a
pyproject.toml
rather thansetup.py
. This allows Pants to be a better community member.At first, we could have an option that allows users to still generate
setup.py
like before, and then possibly deprecate that such that we always use PEPs 517 and 621.