Closed tarsius closed 1 year ago
Ok. Commenting here after my first read through. I need to re-read again but this seems mostly good as far as I can tell.
Is it assumed we will use the blame for the version line to determine the count?
I'm curious as well how merges will affect this.
Very nice work on this @tarsius
Is it assumed we will use the blame for the version line to determine the count?
First we use git log --first-parent ... :(glob) ...
to determine the commit that last touched a relevant file. (See package-build--select-commit
and #67). Additionally we determine the last release tag.
Then we use git rev-list --count COMMIT ^TAG
to get the count.
It's a bit more complicated than that of course, but that's the gist of it.
git blame
is not used.
I'm curious as well how merges will affect this.
Either the merged branch didn't touch any relevant file, in which case the merge commit also doesn't do that, and we continue to look for a commit that does along the --first-parent
line; or the merged branch and thus the merge commit do touch a relevant file, in which case the merge commit is the last relevant commit and we build the snapshot from that.
Or in other words, merge commits aren't even a special case.
Very nice work on this @tarsius
Thanks! :smile:
@tarsius what can I do here to help? Is this an all or nothing thing or it's configurable? I.e., I'd be happy to set up a new subdomain and we could start testing this.
the COUNT of commits since the last VERSION.
But this can change unpredictably, and even reduce, right, leaving folks stuck with an outdated but higher-versioned package? I think that was why I'd suggested the -snapshotTIMESTAMP
scheme in https://github.com/melpa/melpa/issues/2955.
the COUNT of commits since the last VERSION.
But this can change unpredictably, and even reduce, right, leaving folks stuck with an outdated but higher-versioned package?
Yes, but in the paragraphs that follow, I explain how that is addressed.
(I have to drop the dependency on elx
before we can move forward.)
Is this an all or nothing thing or it's configurable?
It's highly configurable, more so than before, and that makes it necessary to make some changes to the interface.
The boolean package-build-stable
(aka $STABLE
) is no longer enough to decide which "flavor of melpa" to use, once there are three of them. Edit: I am still working out the details.
Option package-build-get-version-function
also isn't enough anymore. I replaced it with package-build-release-version-function
and package-build-snapshot-version-function
. Again the new default value of the old variable should be nil, and if it is something else then that should override the new variables for backward compatibility.
I haven't done it yet, but package-build-release-version-function
should actually be replaced with ...-functions
, for even more flexibility (and less boilerplate).
We need two variables because (1) the decision how the current release is determined, and (2) the decision what suffix to use for snapshots, are two independent decisions, and I would not want to create a new function for each possible combination.
The source and suffix I had in mind would be accomplished by this:
(setq package-build-release-version-functions
(list #'package-build-get-tag-version))
(setq package-build-snapshot-version-function
#'package-build-release+count-version)
Note that functions intended for package-build-snapshot-version-function
internally use package-build-release-version-functions
to determine the release part. To instead use the latest tag or, if there is no such tag, the version header for both the release and the snapshot channel we can just:
(setq package-build-release-version-functions
- (list #'package-build-get-tag-version))
+ (list #'package-build-get-tag-version)
+ #'package-build-get-header-version))
(setq package-build-snapshot-version-function
#'package-build-release+count-version)
Or the order of the two release-version functions could be switched.
Or to stick to just using the tag, but append a timestamp instead of count for snapshots:
(setq package-build-release-version-functions
(list #'package-build-get-tag-version))
(setq package-build-snapshot-version-function
- #'package-build-release+count-version)
+ #'package-build-release+timestamp-version)
This is 95% done.
(I should be able to finish it in a day or two, but I am very busy with life right now, so probably next week.)
what can I do here to help?
Let's decide which functions to use initially.
I still think we should stick to what I originally implemented and suggested here for the time being at least.
I.e., use (setq release tag)
and (setq snapshot count)
, and maybe later switching to (setq release (or tag header))
(or (setq release (or header tag))
). Going in that direction is easier than going the other way (because then there is no risk of having to "increasing" from 0.1.0
to 0.0.0
, and because we can delay deciding whether to use (or tag header)
or (or header tag)
.
Also does it seem reasonable to you to deprecate $STABLE
and package-build-stable
, and to replace them with just $MELPA_FLAVOR
. (There is no need for package-build-flavor
, I believe.)
A lot has been said about why the version string format that Melpa uses for its snapshot channel, has to be replaced. I won't repeat all that here, but consider:
Some of the related conversations:
Add three new functions to generate version strings for the snapshot channel of an Emacs Lisp package archive:
package-build-get-tag+timestamp-version
creates version string using the formatVERSION.0.TIMESTAMP
, where VERSION derives from the largest version tag. TIMESTAMP is the COMMITTER-DATE for the identified last relevant commit, using the format%Y%m%d.%H%M
.package-build-get-tag+count-unsafe-version
creates version strings using the formatVERSION.0.COUNT
, where VERSION derives from the largest version tag. COUNT is the number of commits since that tag until the identified last relevant commit.package-build-get-tag+count-version
creates version strings using the same formatVERSION.0.COUNT
, but if upstream rewrites history, then COUNT may consist of multiple version parts. This is what we should use on MELPA.The
.0
separator between the version, based on the tag, and the part that identifies the commit for which a snapshot is build, is necessary because Emacs only supports "pre-releases" but not "post-releases".If "post-releases" were supported, then we could use something like "1.0-42" or "1.0-20230413.123", and those snapshot versions would be both larger than "1.0" and smaller than "1.0.1".
But
version<
et al. actually treat these version strings (as well as "1.0-git42" and "1.0-snapshot") as smaller than "1.0", i.e., they are "pre-releases", not "post-releases".Simply injecting an additional
.0
part doesn't change that:So we have to give up on being able to tell with absolute certainty whether a given version strings identifies a release or a snapshot:
But this is problematic. Just because the version string for the current release has two parts (
1.0
), that does not guarantee that the next release will have two parts too (either1.1
or2.0
). It might also be1.0.1
. We get around that by injecting an additional.0
.Of course the next release after
1.0
could also be1.0.0.1
, but that is much less likely, so we stick with just one separator.0
. (We are stuck between a rock and a hard place, Emacs' unfortunate version comparison implementation and maintainers potentially doing weird things; and there is only so much we can do to cope.)So we go with version strings of the format
VERSION.0.SNAPSHOT
, and now the question is how we determineSNAPSHOT
.We can just use the committer date of the commit. That is what GNU-devel ELPA and NonGNU-devel ELPA do. The main problem with that approach is that it leads to very long version strings. Additionally it cannot guarantee that version strings increase, in case upstream rewrites history.
Simply starting with
1
and increasing it by1
every time we build a new snapshot, is also an option. The resulting version string is short and we can be sure versions keep going up.But
N
is not particularly meaningful.Fortunately there is an alternative; use the
COUNT
of commits since the last VERSION.Like TIMESTAMP, COUNT communicates some information beyond "this is a snapshot, not a release". It isn't the same information though; IMO it is more useful information. For example, if TIMESTAMP is a fairly long time ago, then we have no way of knowing whether the maintainer just fixed a single inconsequential typo, or whether there are many changes. A commit count like
123
however, is a clear indication that a lot has happened since the last release.Like TIMESTAMP, using COUNT is problematic when upstream rewrites history. Rewriting history itself is problematic -- some would say wrong. While I would not go that far, as someone who does at times amend to HEAD or drop HEAD even on the main branch, I am aware that that can have negative consequences. I therefore accept that it is me who should suffer the consequences -- not those who do the right thing, or the users of my packages.
In this context this means that we should optimize for repositories that do not get rewritten. We do that by just using the new count by default. If it is larger than the last count and everything is peachy, and that is what we optimize for.
When the count does not increase due to history rewriting, then we should make an effort to relieve the users' suffering. And the maintainer has to pay the price, by getting a version string for their snapshot, which is uglier than the normal version string of well-behaved repositories.
Doing that is simple. If the new count is smaller than the old count, then we don't replace the old count, instead we append the new count. For example, if upstream drops HEAD, then the version string gets increase like so:
There are various edge-cases that need to be considered. I have written tests for those, and have additionally added a make target named "demo", which extends the tests to output documentation and git logs, to demonstrate what those edge-cases are and how we deal with them.
For convenience, I am including its output here: