Open jonathanunderwood opened 5 years ago
I fully agree, but flit’s author does not seem to. A previous related request was turned down and also a PR implementing this (not because of how it was implemented, but purely the idea of it). :disappointed:
Wow, that's very sad indeed. But it's helpful to know while I'm evaluating flit and poetry.
I'm open to thinking about ways to make it easier to manage version numbers. But the sort of solutions people have proposed so far all involve extra complexity inside every package to look up its version number when it gets imported. That's what I object to. Any extra complexity needed should go in developer tooling, not in the packages' runtime code.
But the sort of solutions people have proposed so far all involve extra complexity inside every package to look up its version number when it gets imported.
I haven't had time to review the previous proposals. But, at least in the case of setuptools_scm, very little is required of the developer: (1) require setuptools_scm (wouldn't be needed if it was a capability directly in flit); (2) Optionally define a file for the version tag to be written to - this is only needed if you don't want to make use of the pkg_resources
functionality for finding the package version at runtime.
So, I'd hope it was possible for flit to support this something like:
[build-system]
requires = ["flit"]
build-backend = "flit.buildapi"
version-backend = "flit.vcs_version"
Would you regard that as requiring too much of a packager?
I'm not concerned about what it involves for the developer. I'm concerned about what the package winds up doing at runtime. It needs some extra dependency installed, and then it goes hunting for its metadata to figure out it's version number. I consider this similar to an object using introspection to look up its variable name in the call stack: you can do it, but you shouldn't. It can also have performance issues, because it winds up having to scan every installed package to find the one that it is. And it definitely violates the principle of 'do as little as possible on import'.
I definitely want version numbers to be 'stamped' into the code of a package somehow so the package doesn't need to go looking for them. I tend to do that manually. I'm aware of tools like bumpversion to partially automate it, but so far I haven't really taken to any of them.
Just to add my voice that this is stopping me from using flit for my build tool chain. A pity really. versioneer (sadly no longer supported) installs a module into the package to do the parsing whereas setuptools-scm use pkg_resources
(maybe a bit heavy). Ultimately, both write the version string at the time of distribution into PKG-INFO
and parse it from there for installed packages.
Do you consider this behavior unacceptable? Especially the versioneer way of installing a module into the package?
Just to add my voice that this is stopping me from using flit for my build tool chain. A pity really. versioneer (sadly no longer supported) installs a module into the package to do the parsing whereas setuptools-scm use pkg_resources (maybe a bit heavy).
Just to note: setuptools_scm doesnt have to use pkg_resources. It can simply write the version to a file that is then included in the package.
Concerns about using pkg_resources are orthogonal to the issue here.
My bad, indeed setuptools-scm writes the version to PKG-INFO
but the recommended way (from the README) for accessing the version in the installed package is to use pkg_resources:
from pkg_resources import get_distribution, DistributionNotFound
try:
__version__ = get_distribution(__name__).version
except DistributionNotFound:
# package is not installed
pass
Concerns about using pkg_resources are orthogonal to the issue here.
While I tend to agree with your statement, I interpret @takluyver's comments in a way that these are exactly the concerns that he has. Specifically, I'm referring to this part:
But the sort of solutions people have proposed so far all involve extra complexity inside every package to look up its version number when it gets imported. That's what I object to. Any extra complexity needed should go in developer tooling, not in the packages' runtime code.
Random idea by a passer-by: Would it be a good idea to have such a configuration:
# mypackage/__init__.py
__version__ = "__flit_scm_version__"
When flit builds the sdist and wheel, it would detect this special version token. If it is detected, parse __init__.py
with ast
and replaces sources in artifacts with the correct version. This would solve the ditribution workflow problem preferred by setuptools-scm user, but without compromising the version detection mechanism for end users.
Some downsides I can think of:
__version__ = __flit_scm_version__
which would raise a NameError
at runtime, but requires some extra configuration to keep linters happy.flit install -s
(while setuptools-scm works well with setup.py develop
). Maybe pth wold work with some hacks though?Some downsides I can think of:
* Users gets a nonsensical value if the package version is read without being installed. This is probably minor though. setuptools-scm’s suggested solution is only marginally better IMO. Or maybe we can go with `__version__ = __flit_scm_version__` which would raise a `NameError` at runtime, but requires some extra configuration to keep linters happy. * This won’t work with `flit install -s` (while setuptools-scm works well with `setup.py develop`). Maybe pth wold work with some hacks though?
The __version__
could be made to be stamped at install time (either in __init__
or rather in a separate file) but looked at dynamically (e.g. calling git describe
) at runtime. The package author would write:
# __flit_version__.py
import some_package_to_get_versions
__version__ = some_package_to_get_versions.get_version()
And
#__init__.py
from . __flit_version__ import __version__
...
At install time flit would rewrite __flit_version__.py
to contain only whatever the value of __version__
(on import). It wouldn't have to know about how it works inside.
#__flit_version__.py (rewritten)
__version__ = "<String with the actual version>"
While the version would be resolved dynamically from SCM (or whatever) when the package is symlinked for development.
One workflow where it is not trivial to change versions manually every times is if one wants to publish to PyPI every time a change is made to the master branch, using some CI service. The packages are then rejected because they are duplicated.
i am using setuptools_scm and the way that works is basically this:
in setup.py
:
setup(
use_scm_version={'write_to': 'MODULE/_version.py'},
# [...]
)
Then in the module's __init__.py
:
try:
from _version import version
except ImportError:
try:
from setuptools_scm import get_version
version = get_version()
except (ImportError, LookupError):
version = '???'
__version__ = version
It seems to me flit
could build a wheel the same way, by dropping a magic file at build time so that, at runtime, the version would be available.
The alternative is to load the version through the entrypoint
. That has traditionally been slow (because of pkg_resources, see https://github.com/pypa/setuptools/issues/510) so maybe that should be avoided.
In general, this smells like PEP material to me: there should be a standard way for Python packages to fetch their own version, and a standard way for Python packaging tools to write that version somewhere.
In any case, it seems to me that having to maintain that version in two places is counter-intuitive and something I would like to avoid, because it's bound to create releases with the wrong version number metadata.
It is also the primary reason why I haven't used flit in this new project.
Thanks for your work!
FWIW, you can accomplish this rather trivially, without involving flit, by using an in-tree backend to pre-generate a version file with setuptools-scm. One downside of this is that you'll have to forgo flit build
. I've put together a demo here.
Chiming in to say that a pluggable mechanism to obtain the version would be very welcome. Obtaining the version from SCM when building and publishing in CI is also core to my workflows.
@takluyver tools like setuptools_scm by default invoke the tools only at build time not runtime
im quite annoyed that flit seems to actively prevent inclusion of support for using setuptools_scm as source of version numbers, as its practically the only thing preventing me to migrate a lot of my own packages + a lot of pytest-dev to flit
I've been thinking again about how to incorporate something like setuptools_scm, and I'm afraid I still don't see a good option. I can see the attraction of getting the version from git, but I have a set of priorities which I consider more important, and which as far as I can see are (together) fundamentally incompatible with this model:
__version__
attribute accessible at runtime, containing their version number.pkg.__version__
should involve running git
commands, looking for metadata outside the package directory, or anything like that.if i_am_installed:
, no try/except. Just the same code, behaving the same way. I use development installs a lot, and I like that there are no special tricks involved in loading & running the code.git archive
like an sdist if you want.I don't see a way to do something like setuptools_scm without compromising on at least one of those. I confess I haven't studied setuptools_scm closely, but I believe it's a logical incompatibility which no technical measure can get around.
You may say that I should allow people to choose different trade-offs to the ones I want. That's an entirely reasonable argument. But this goes pretty deep into the design of Flit - any way that I would address this means a build step, a difference between the code in the repository and the code that you install, even if it's a tiny difference. It's fairly central to the design of Flit that pure Python packages don't need a build step, that what is in the repo is precisely what gets installed. It's also a pretty important design goal to be simple, which means not having too many different ways to do things.
For me, the best answer to all this is something like bump2version, which deals with updating version numbers as a developer tool, so the changes it makes get committed. I know it's not so elegant to have multiple copies of the version number, and I know it's useful to have the extra information from git describe
, but for my boring, old fashioned way of working, this works well enough, and doesn't compromise any of the points above. I am using this for Flit itself.
its practically the only thing preventing me to migrate a lot of my own packages + a lot of pytest-dev to flit
The outcome of making a simple tool, drawing a line around a subset of possible features is that there must be use cases just beyond the line, which would be supported if the simple tool just added a little bit more complexity. If you move that line, add the extra piece, then someone else's use case will be just beyond the new line. So at some point we either have to refuse to move the line, or let the simple thing become another complex thing with every feature anyone wants. For me, at the moment, this feels like the point to stop - I'm sorry that that leaves your use case just out of reach, but any point does that to someone.
I don't want to be a dictator and say 'never'. I'll keep thinking about how it might be doable. But at present, this idea feels like a really bad fit with Flit. Much more so than namespace packages, which I held out on for years but actually fitted in conceptually quite easily.
@takluyver with your argument from ignorance right after i lined out that setuptools_scm by default is build time only, you just elevate the issue
the only time to ever invoke git would be a git hook to update metadata on editable installed packages using the hooks of the pep
how am i to trust your judgement if you make up an entirely different set of issues right after i line up one that would actually fit your narrative
For what's worth, I'd also like setuptools-scm like feature support for flit. I'm personally fine with differing on this one case from the code as source vs the code as installed. Though I gather @takluyver isn't keen on that. This is also the biggest gap from adopting flit widely across my projects. I consider bump2version
two odd and awkward to use to be considered simple
, so I'm back to manually editing stuff...
I'm in favor of flit
's continued lack of support for setuptools_scm
-style metadata, mainly for this reason that @takluyver mentioned:
any way that I would address this means... a difference between the code in the repository and the code that you install
Reproducibility hasn't been mentioned in this thread yet, likely because it's not something the wider Python packaging ecosystem particularly cares about. However, I predict it will become more significant in the near future, and as that happens, projects that aren't built in a reproducible way are likely going to find that such features will become less desirable.
Right now, flit
is the best tool to build reproducible Python packages because it was designed with that in mind, and I'd hate to see that change.
What makes setuptools-scm
lack reproducibility?
I maintain a few packages using setuptools-scm
and they're reproducible.
There's nothing specific about it that would make a released package non-reproducible. Only packages built from uncommitted source has a timestamp attached to it, but I don't think it's very common to publish a package with source that hasn't been committed -- quite the contrary, the usual thing is to create a tag first, then publish.
What makes
setuptools-scm
lack reproducibility?
I guess, you could argue that since you can change git tags, there is no guarantee of identical source code. Then again, PyPI does not allow you to replace a specific version so I also don't see it as problematic.
I'd like to pull this discussion away from what functionality setuptools-scm
provides or how it is designed or how interacts with parts of the ecosystems.
And... with that, I'll highlight an important part from Thomas' post above:
this goes pretty deep into the design of Flit - any way that I would address this means a build step, a difference between the code in the repository and the code that you install, even if it's a tiny difference. It's fairly central to the design of Flit that pure Python packages don't need a build step, that what is in the repo is precisely what gets installed.
The version is also in the repository, so unless you mean on the disk and excluding the git repository I don't see a conflict here. After all a git tag is just a file on the disk under .git...
It's fairly central to the design of Flit that pure Python packages don't need a build step [...]
I understand not wanting to require a build steps for packages that don't require it, but does that also mean excluding any functionality for those that opt-into it too?
While avoiding a build step keeps flit itself simpler, it just pushes that complexity to every single project out there, requiring everyone has to write their own build script themselves to handle updating the version each time a release is made.
Flit doesn't support projects with build steps (see #119). Not even optionally, not even if you want to supply all the logic of the build step yourself. Flit is built just to package up static files and add the necessary metadata. The way to opt into a build step is to use another build backend that does support that. Thanks to PEP 621, most of the metadata you specify for Flit should be portable to other build backends, so it hopefully won't be a lot of work to move to another backend if you outgrow Flit.
(This is of limited use right now, since there aren't many viable backends to move to. setuptools doesn't support PEP 621 metadata, and I don't know if Poetry supports build steps either. But I believe that more build backends will emerge for projects with build steps - I know there are people interested in using Meson & CMake to build Python packages)
'No build steps' isn't absolutely set in stone, but in the wake of PEP 621 I closed issue #119 to indicate that it's as decided as it's likely to get for the foreseeable future.
This is of limited use right now, since there aren't many viable backends to move to.
Yea.
FWIW, there's a grant proposal submitted for https://github.com/scikit-build/scikit-build to work on making it a pyproject.toml-based backend. That should be viable for certain usecases. I also think that there's a reasonable case to be made for writing custom dedicated build backends for other use cases (like https://github.com/pradyunsg/sphinx-theme-builder).
No, Poetry doesn't support build steps nor does it support PEP 621.
For setuptools, the relevant issues are https://github.com/psf/fundable-packaging-improvements/issues/25 and https://github.com/pypa/setuptools/issues/1688. 🤷🏽
any way that I would address this means a build step, a difference between the code in the repository and the code that you install,
Why would the code change? Wouldn't the change be in how the Version
in PKG-INFO
is generated?
(Also, isn't the generation of PKG-INFO
technically a "build step"?)
Version info is metadata, but flit
is trying to treat it like program data... My mind keeps jumping back to the old RCS headers (that had to be in the files) before modern VCSes like git
were a thing.
__version__
at the package root is just a convenience. Really, other packages that want to know the version should be looking for that info from the source of truth directly (e.g. via importlib.metadata
). The package doesn't actually need to know its own version at all. That info is needed solely for other programs (e.g. pip
/apt
/downstreams/etc.) to deduce compatibility.
PDM supports PEP 621, and is usable as PEP 517 backend (it's also a poetry-like tool, but you don't need to even install it any more than you need to install flit to use it). There's also Trampolim and Whey, both smaller projects but still showing that PEP 621 works, and you should be able to transition quite easily based on what is important to you - one feature of PEP 621. Both PDM and Trampolim support Git versioning, by the way. Scikit-HEP/cookie supports all four via PEP 621: https://github.com/scikit-hep/cookie/blob/main/%7B%7Bcookiecutter.project_name%7D%7D/pyproject-flit621%2Cpdm%2Ctrampolim%2Cwhey.toml
sdists are essentially just an archive of the source files, not a special halfway state. You can use the result of git archive like an sdist if you want.
Actually, you can use git archives. There's a a .gitattribute
file that can be used to have git include the version in the archive - in a pretty recent Git version, there's actually enough information there to compute everything setuptools-scm needs, actually. I think the only reason it's not available directly in setuptools-scm now is due to @RonnyPfannschmidt needing some time to work on it. (External plugin https://pypi.org/project/setuptools-scm-git-archive/ )
No comment on build step vs. not, design of Flit, usefulness of using Git for versions, etc. Just pointing out these two things. I guess I will say I think using Git for versions is very useful, I love using setuptools-scm and would like to be able to use it setuptools-free for scikit-build eventually. But otherwise, understandable if that's not something in Flit's philosophy or design direction.
I made versioneer work with flit and it was only a minor hack https://github.com/pypa/flit/issues/271#issuecomment-575765013, and apparently, there are some changes that could be done to flit without compromising @takluyver philosophy to make it work even better https://github.com/pypa/flit/pull/382
Nowadays there is a non-vendored version https://github.com/python-versioneer/versioneer-518 and a new maintainer gave the repository some love.
I guess folks here might have opinions on setuptools-scm vs versioneer, but even if versioneer is certainly not super maintained, it gets the job done.
Side note: Neither solution, to my knowledge, supports trunk-based versioning.
The package doesn't actually need to know its own version at all.
Any package that provides a program with a --version
flag (e.g.: a huge amount of cli tools) need to know their version.
Many networked applications also expose their current version.
importlib.metadata
is rather slow (far slower than a string coded into __version__
).
It seems to add about 150ms on my setup (i7-8650U), which starts to become noticeable in a cli tool (150ms isn't much, but when something already takes 400ms, jumping to 550ms feels noticeably more sluggish).
Any package that provides a program with a --version flag (e.g.: a huge amount of cli tools) need to know their version.
The approach that versioneer and most tools take is to package the version when you create your sdist or wheel. So an installed package will always have it at the ready. I don't think that's an issue .
Any package that provides a program with a
--version
flag (e.g.: a huge amount of cli tools) need to know their version.
Again... that's a convenience feature. Any situation someone could run that, they can most likely run python -m pip show thepkg
(or apt
/yum
/dnf
/etc.) instead. Don't get me wrong; I'm all for convenience for the user, but that's not the point...
The point is we could get rid of __version__
, but we definitely cannot get rid of version metadata.
Many networked applications also expose their current version.
I absolutely don't deny there are situations where it's useful for the program to know its own version...
People find it weird that a program wouldn't already have this information about itself, but it's much more natural when thinking about it from the perspective of other programs (where it matters most). That's the perspective I'm trying to highlight.
importlib.metadata
is rather slow (far slower than a string coded into__version__
).
I would agree that being able to pull the version from Git shouldn't preclude someone from hard-coding if they are that concerned about performance. I, for example, still use list
s sometimes even when tuple
s would technically be more performant, so... that person is not me :sweat_smile:
Structurally the main cost there seems to be finding the dist info, which is getting more performant
As multi version installation is no longer the cases, i believe that this can be speed up way more but that's ot here
The problem here, I believe, has nothing to do with providing a __version__
. That would be nice to be able to do, but ultimately don't matter beyond performance; you could add that value using importlib.metadata
to your own package if you really wanted to. The problem as I understand it is that building an SDist is no longer identical to a git archive tarball; one of them captures version metadata (even if just in the dist-info), while the other one looses it. There has to be an extra "step" that captures this version info and builds a special SDist instead of just the simple packing that's done now.
Technically, there is a dist-info file already in the SDist that's not in the tarball, so a better way to put the current design is source -> SDist is identical to source -> git archive -> source -> SDist. Making this change would affect this design (even if only for those that opt-in), and that's the holdup (both technical and directional).
And, as I mentioned, if it's really only git archive that you are worried about, there's a way to encode the version into a git archive (which is what GitHub uses for release tarballs).
@henryiii starting with git 2.24 all Metadata setuptools_scm needs woll be available given the correct setup of a archival file,
The feature is still in development
I just finished the initial migration of one of my projects to flit and kept admiring every design decision, level of maturity, and simplicity of the build tool (very different from a previous experience with poetry!). But then, came the issue of versioning, currently handled by setuptools_scm, and reading this thread and related issues, and now I wonder if I should roll back to setuptools. The case against this functionality does not sound convincing to me, but I guess ultimately it's a decision to be made by the project's developers and contributors. Very disappointing, but apparently flit won't be the tools I can use for my projects, despite everything else about it being just perfect 😞
Im going to migrate to hatch, it already has setuptools_scm integration and after Easter it will be first class
As @anarcat said,
In general, this smells like PEP material to me: there should be a standard way for Python packages to fetch their own version, and a standard way for Python packaging tools to write that version somewhere.
I think this is correct. Also on point, @dmtucker said:
isn't the generation of PKG-INFO technically a "build step"?
I guess it is because it falls under the onus of a build backend, but would argue that it shouldn't be considered a build step.
Although this is a PEP aside, I mention it here because the concept of a "snapshot" step as distinct from build steps could allow flit to set __version__
from scm without opening the door to more complex build steps. Similar to @pradyunsg 's suggestion here
Forgive me if this has already been thought through, as these are all just ideas from a packaging layman, but here's my thoughts on how to fit a "snapshot" step into PEP 517 compliance:
prepare_metadata_for_build_wheel
hook fromprepare_metadata_for_sdist
hook, which could be called by prepare_metadata_for_build_wheel
if building from original source tree.A build back-end MUST raise an error if the metadata specifies the
name
indynamic
.
which would allow a build backend to be PEP-517 compliant but still allow version
in dyanmic
Very late to the party, but after going through this discussion and after a quick search around, I think I found the ideal solution that would satisfy many of the prospective-flit-users while maintaining Flit's philosophy: write your own micro-backend that extends flit_core
with setuptools_scm
. Here is a minimal example of accomplishing exactly this: https://github.com/LukasGelbmann/adventkit/blob/main/packaging/adventkit_backend.py
A motivated person could make this into a separate backend package that depends on flit_core
and setuptools_scm
, which would be easily maintainable since this "extension" would be tiny. From Flit's side, I think providing an explicit extension/hooking mechanism would make it easy for others to extend Flit with more bells and whistles. But that's a discussion for another thread.
Anyway, I do believe brutally sticking to the minimalist philosophy that Flit originally started with is a great idea! This would also enable people to easily and confidently extend its functionality without fear of upstream API changing drastically. And just to highlight to future TLDR-readers, here is Thomas' comment about why the feature in discussion is not compatible with Flit's design philosophy:
this goes pretty deep into the design of Flit - any way that I would address this means a build step, a difference between the code in the repository and the code that you install, even if it's a tiny difference. It's fairly central to the design of Flit that pure Python packages don't need a build step, that what is in the repo is precisely what gets installed.
I'm currently working on the successor off setuptools-scm that won't depend on setuptools
I still hope to find some common ground to allow scm metadata to be considered a source of truth
The setuptools world has a module
setuptools_scm
which allows setting the version directly from the SCM in use for the project. For example, if a commit is tagged with a version identifier, that becomes the value of version, whereas if the commit is not tagged, the hash and the distance from the last tag may be used.This is super useful, and often critical, functionality for automation. It would be great if flit had machinery for doing this kind of thing.