Implement new version scheme for snapshot channel

tarsius commented 1 year ago

A lot has been said about why the version string format that Melpa uses for its snapshot channel, has to be replaced. I won't repeat all that here, but consider:

(version< "109.0" "19700101.0") => t

Some of the related conversations:

2015-07 melpa/melpa#2944
2015-07 melpa/melpa#2955
2019-06 melpa/melpa#6319
2020-01 melpa/melpa#6656
2022-07 melpa/melpa#8135
2022-11 melpa/package-build#65

Add three new functions to generate version strings for the snapshot channel of an Emacs Lisp package archive:

package-build-get-tag+timestamp-version creates version string using the format VERSION.0.TIMESTAMP, where VERSION derives from the largest version tag. TIMESTAMP is the COMMITTER-DATE for the identified last relevant commit, using the format %Y%m%d.%H%M.
package-build-get-tag+count-unsafe-version creates version strings using the format VERSION.0.COUNT, where VERSION derives from the largest version tag. COUNT is the number of commits since that tag until the identified last relevant commit.
package-build-get-tag+count-version creates version strings using the same format VERSION.0.COUNT, but if upstream rewrites history, then COUNT may consist of multiple version parts. This is what we should use on MELPA.

The .0 separator between the version, based on the tag, and the part that identifies the commit for which a snapshot is build, is necessary because Emacs only supports "pre-releases" but not "post-releases".

If "post-releases" were supported, then we could use something like "1.0-42" or "1.0-20230413.123", and those snapshot versions would be both larger than "1.0" and smaller than "1.0.1".

But version< et al. actually treat these version strings (as well as "1.0-git42" and "1.0-snapshot") as smaller than "1.0", i.e., they are "pre-releases", not "post-releases".

(version< "1.0-42" "1.0") => t

Simply injecting an additional .0 part doesn't change that:

(version< "1.0.0-42" "1.0") => t

So we have to give up on being able to tell with absolute certainty whether a given version strings identifies a release or a snapshot:

(version< "1.0" "1.0.42") => t

But this is problematic. Just because the version string for the current release has two parts (1.0), that does not guarantee that the next release will have two parts too (either 1.1 or 2.0). It might also be 1.0.1. We get around that by injecting an additional .0.

1.0 < 1.0.0.42 < 1.0.1

Of course the next release after 1.0 could also be 1.0.0.1, but that is much less likely, so we stick with just one separator .0. (We are stuck between a rock and a hard place, Emacs' unfortunate version comparison implementation and maintainers potentially doing weird things; and there is only so much we can do to cope.)

So we go with version strings of the format VERSION.0.SNAPSHOT, and now the question is how we determine SNAPSHOT.

We can just use the committer date of the commit. That is what GNU-devel ELPA and NonGNU-devel ELPA do. The main problem with that approach is that it leads to very long version strings. Additionally it cannot guarantee that version strings increase, in case upstream rewrites history.

VERSION.0.TIMESTAMP

Simply starting with 1 and increasing it by 1 every time we build a new snapshot, is also an option. The resulting version string is short and we can be sure versions keep going up.

VERSION.0.N

But N is not particularly meaningful.

Fortunately there is an alternative; use the COUNT of commits since the last VERSION.

VERSION.0.COUNT

Like TIMESTAMP, COUNT communicates some information beyond "this is a snapshot, not a release". It isn't the same information though; IMO it is more useful information. For example, if TIMESTAMP is a fairly long time ago, then we have no way of knowing whether the maintainer just fixed a single inconsequential typo, or whether there are many changes. A commit count like 123 however, is a clear indication that a lot has happened since the last release.

Like TIMESTAMP, using COUNT is problematic when upstream rewrites history. Rewriting history itself is problematic -- some would say wrong. While I would not go that far, as someone who does at times amend to HEAD or drop HEAD even on the main branch, I am aware that that can have negative consequences. I therefore accept that it is me who should suffer the consequences -- not those who do the right thing, or the users of my packages.

In this context this means that we should optimize for repositories that do not get rewritten. We do that by just using the new count by default. If it is larger than the last count and everything is peachy, and that is what we optimize for.

When the count does not increase due to history rewriting, then we should make an effort to relieve the users' suffering. And the maintainer has to pay the price, by getting a version string for their snapshot, which is uglier than the normal version string of well-behaved repositories.

Doing that is simple. If the new count is smaller than the old count, then we don't replace the old count, instead we append the new count. For example, if upstream drops HEAD, then the version string gets increase like so:

1.0.0.42  -->  1.0.0.42.41

There are various edge-cases that need to be considered. I have written tests for those, and have additionally added a make target named "demo", which extends the tests to output documentation and git logs, to demonstrate what those edge-cases are and how we deal with them.

make demo

For convenience, I am including its output here:

``` make[1]: Entering directory '/home/jonas/.config/emacs/lib/package-build/test' Loading /home/jonas/.config/emacs/lib/package-build/test/package-build-tests.el (source)... Running 9 tests (2023-04-18 10:46:45+0200, selector ‘t’) === 001 use-latest-relevant-commit === Base the snapshot version string on the greatest release tag. Determine the last reachable commit that touches a file that we want to include in packages. Skip over commits that touch only files that are not included. By doing so, we avoid building new snapshots that are identical to the previous snapshot, except for the version string. Then determine the greatest release tag, while ignoring whether that is an ancestor of the selected commit. (Below we will look into what it means, if that is not the case, but for now assume it is.) The snapshot version string has the format "VERSION.0.COUNT". VERSION is the version string derived from the selected tag and COUNT is the number of commits from that tag to the selected commit. Inject the "separator" ".0" inbetween the VERSION and the COUNT to sufficiently decrease the odds that the version for a future release is smaller than the version for this snapshot. (Ideally Emacs would not only support "pre-releases" but also "post-releases". If that were the case, we could use something like "1.0-git42", for a commit that comes 42 commits after the tag "1.0". But as it is implemented in `version<' et al., "1.0-git" is actually smaller than "1.0".) Injecting one ".0" is both necessary and sufficient. Just because the last release is "1.0", we cannot be sure that the next release will be either "2.0" or "1.1". It might also be "1.0.1". By using "1.0.0.COUNT" instead of just "1.0.COUNT", we nearly ensure that that any potential future release is smaller than the snapshot. Of course the next release after "1.0" could also be "1.0.0.1" (or "1.0.0.0.0.0.1" for that matter) but that is much less likely. (We could use the separator ".1-snapshot" to be absolutely sure that a future release is always greater than a snapshot, but that is disgustingly long. Note that we could not use the shorter ".0-git" because `version-to-list' encodes both -snapshot and -git as -4, and `package-version-join' turns -4 into -snapshot.) Building from /tmp/tmp.PjqoU8zNCg/working/001/pkg Built pkg in 0.043s, finished at 2023-04-18T08:46:45+0000 * 899b661 (origin/main) Add other * 5cae135 (tag: elpa_A__1.0.0.0.1) Edit pkg.el * ded4918 (tag: 1.0.0) Initial import passed 1/9 package-build-test-001-use-latest-relevant-commit (0.214680 sec) === 002 dont-append-count-when-using-tag === When the latest relevant commit is a tagged release, then use that version without appending a commit count. Building from /tmp/tmp.PjqoU8zNCg/working/002/pkg Built pkg in 0.105s, finished at 2023-04-18T08:46:45+0000 * fbc9cd6 (origin/main) Add other * ded4918 (tag: elpa_A__1.0.0, tag: 1.0.0) Initial import passed 2/9 package-build-test-002-dont-append-count-when-using-tag (0.231762 sec) === 003 dont-go-past-tag === If the latest commit that modifies a relevant file is an ancestor of the latest release tag (as opposed to the other way around), then use the release commit, because the version on the snapshot channel should never be older than the version on the release channel. Building from /tmp/tmp.PjqoU8zNCg/working/003/pkg Built pkg in 0.035s, finished at 2023-04-18T08:46:45+0000 * f482a81 (origin/main) Add other * b208dc6 (tag: elpa_A__1.0.1, tag: 1.0.1) Release 1.0.1 without changes * 5cae135 Edit pkg.el * ded4918 (tag: 1.0.0) Initial import passed 3/9 package-build-test-003-dont-go-past-tag (0.122652 sec) === 004 use-zeros-if-no-tag === If there is no release tag, use two zero parts before the separator zero part and count part. In this case the count is the total number of commit, reachable from the selected commit. So, if there are no tags, then the count part is always prefixed with exactly three zero parts. If the package author later uses "0.0.1" as the first tag, then that is larger than existing snapshots. However if they begin with "0.0.0.1", then that is not the case. We have to draw the line somewhere and, IMO, if it matters at all whether a certain release is used or not, then the first bumped part of the version string should not be the sub-sub-minor-part (this applies to the first release, but also future version bumps). Building from /tmp/tmp.PjqoU8zNCg/working/004/pkg Built pkg in 0.024s, finished at 2023-04-18T08:46:45+0000 * 3ce2b15 (tag: elpa_A__0.0.0.3, origin/main) Edit pkg.el * fbc9cd6 Add other * ded4918 Initial import passed 4/9 package-build-test-004-use-zeros-if-no-tag (0.083377 sec) === 005 count-since-merge-base === Sometimes release are made on dedicated release branches, and sometimes these release branches are not merged back into the development branch, immediately or at all. If that happens, we cannot count the commits from the tag to the latest relevant commit. Instead we count the commits from the the merge-base of the selected commit and the selected tag. The merge-base is the last common ancestor of two commits. Building from /tmp/tmp.PjqoU8zNCg/working/005/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:45+0000 * 3ce2b15 (tag: elpa_A__1.0.0.0.2, origin/main) Edit pkg.el * fbc9cd6 Add other | * c58e60c (tag: 1.0.0, releases) Release 1.0.0 |/ * ded4918 Initial import passed 5/9 package-build-test-005-count-since-merge-base (0.116674 sec) === 006 merge-base-required === Due to extremely sloppy history rewritting, it is possible that tags share no history at all with the branches that still exist in the repository. This is the result of an upstream mistake, and it is not our job to fix it. Again, we have to draw the line somewhere. So, if the greatest release tag shares no history with the tracked branch, then we pretend there are no tags at all. (We could instead ignore that tag and use the next greatest release tag, which actually shares some history, if any, but IMO that would be even more confusing to users.) Building from /tmp/tmp.PjqoU8zNCg/working/006/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:45+0000 * 6663edd (tag: 6.0.1, detached) A root commit * 5cae135 (tag: elpa_A__0.0.0.2, origin/main) Edit pkg.el * ded4918 (tag: 6.0.0) Initial import passed 6/9 package-build-test-006-merge-base-required (0.106321 sec) === 007 amending-adds-count-part === If HEAD is amended, add an additional count version part. Building from /tmp/tmp.PjqoU8zNCg/working/007/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:45+0000 Building from /tmp/tmp.PjqoU8zNCg/working/007/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:46+0000 * 6d26f69 (tag: elpa_B__7.0.0.1.1, origin/main) Edit pkg.el (modified) | * 5cae135 (tag: elpa_A__7.0.0.1) Edit pkg.el |/ * ded4918 (tag: 7.0) Initial import passed 7/9 package-build-test-007-amending-adds-count-part (0.138854 sec) === 008 dropping-adds-count-part === If HEAD is removed, add an additional count version part. Building from /tmp/tmp.PjqoU8zNCg/working/008/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:46+0000 Building from /tmp/tmp.PjqoU8zNCg/working/008/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:46+0000 * f138089 (tag: elpa_A__8.0.0.0.2) Edit pkg.el (will be dropped) * 5cae135 (tag: elpa_B__8.0.0.0.2.1, origin/main) Edit pkg.el * ded4918 (tag: 8.0.0) Initial import passed 8/9 package-build-test-008-dropping-adds-count-part (0.146229 sec) === 009 dropping-adds-count-part === Accumulate and shed smaller count parts. The previous snapshot may have multiple count parts. Compare the new count with the last of these. If the new count is smaller than the last count, then append the new count. Otherwise remove the old count part. Continue this process for preceeding count parts until there are none left, or it is larger than the new count. Building from /tmp/tmp.PjqoU8zNCg/working/009/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:46+0000 Building from /tmp/tmp.PjqoU8zNCg/working/009/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:46+0000 Building from /tmp/tmp.PjqoU8zNCg/working/009/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:46+0000 Building from /tmp/tmp.PjqoU8zNCg/working/009/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:46+0000 Building from /tmp/tmp.PjqoU8zNCg/working/009/pkg Built pkg in 0.036s, finished at 2023-04-18T08:46:46+0000 * 5f141ce (tag: elpa_E__1.0.0.4, origin/main) A * 2ac5f45 9 * 5ea956f 8 | * 313e86c (tag: elpa_A__1.0.0.3) 3 | | * 433ebf7 (tag: elpa_B__1.0.0.3.3) 4 | |/ | * d61b8f9 2 | * 54bab6d 1 | | * 323cbd4 (tag: elpa_C__1.0.0.3.3.1) 5 | |/ | | * 43d78a4 (tag: elpa_D__1.0.0.3.3.2) 7 | |/ |/| * | 296e97d 6 |/ * ded4918 (tag: 1.0) Initial import passed 9/9 package-build-test-009-dropping-adds-count-part (0.402664 sec) Ran 9 tests, 9 results as expected, 0 unexpected (2023-04-18 10:46:46+0200, 1.563899 sec) make[1]: Leaving directory '/home/jonas/.config/emacs/lib/package-build/test' ```

milkypostman commented 1 year ago

Ok. Commenting here after my first read through. I need to re-read again but this seems mostly good as far as I can tell.

Is it assumed we will use the blame for the version line to determine the count?

I'm curious as well how merges will affect this.

Very nice work on this @tarsius

tarsius commented 1 year ago

Is it assumed we will use the blame for the version line to determine the count?

First we use git log --first-parent ... :(glob) ... to determine the commit that last touched a relevant file. (See package-build--select-commit and #67). Additionally we determine the last release tag.

Then we use git rev-list --count COMMIT ^TAG to get the count.

It's a bit more complicated than that of course, but that's the gist of it.

git blame is not used.

I'm curious as well how merges will affect this.

Either the merged branch didn't touch any relevant file, in which case the merge commit also doesn't do that, and we continue to look for a commit that does along the --first-parent line; or the merged branch and thus the merge commit do touch a relevant file, in which case the merge commit is the last relevant commit and we build the snapshot from that.

Or in other words, merge commits aren't even a special case.

Very nice work on this @tarsius

Thanks! :smile:

milkypostman commented 1 year ago

@tarsius what can I do here to help? Is this an all or nothing thing or it's configurable? I.e., I'd be happy to set up a new subdomain and we could start testing this.

purcell commented 1 year ago

the COUNT of commits since the last VERSION.

But this can change unpredictably, and even reduce, right, leaving folks stuck with an outdated but higher-versioned package? I think that was why I'd suggested the -snapshotTIMESTAMP scheme in https://github.com/melpa/melpa/issues/2955.

tarsius commented 1 year ago

the COUNT of commits since the last VERSION.

But this can change unpredictably, and even reduce, right, leaving folks stuck with an outdated but higher-versioned package?

Yes, but in the paragraphs that follow, I explain how that is addressed.

tarsius commented 1 year ago

(I have to drop the dependency on elx before we can move forward.)

Is this an all or nothing thing or it's configurable?

It's highly configurable, more so than before, and that makes it necessary to make some changes to the interface.

The boolean package-build-stable (aka $STABLE) is no longer enough to decide which "flavor of melpa" to use, once there are three of them. Edit: I am still working out the details.

Option package-build-get-version-function also isn't enough anymore. I replaced it with package-build-release-version-function and package-build-snapshot-version-function. Again the new default value of the old variable should be nil, and if it is something else then that should override the new variables for backward compatibility.

I haven't done it yet, but package-build-release-version-function should actually be replaced with ...-functions, for even more flexibility (and less boilerplate).

We need two variables because (1) the decision how the current release is determined, and (2) the decision what suffix to use for snapshots, are two independent decisions, and I would not want to create a new function for each possible combination.

The source and suffix I had in mind would be accomplished by this:

(setq package-build-release-version-functions
      (list #'package-build-get-tag-version))
(setq package-build-snapshot-version-function
      #'package-build-release+count-version)

Note that functions intended for package-build-snapshot-version-function internally use package-build-release-version-functions to determine the release part. To instead use the latest tag or, if there is no such tag, the version header for both the release and the snapshot channel we can just:

 (setq package-build-release-version-functions
-      (list #'package-build-get-tag-version))
+      (list #'package-build-get-tag-version)
+            #'package-build-get-header-version))
 (setq package-build-snapshot-version-function
       #'package-build-release+count-version)

Or the order of the two release-version functions could be switched.

Or to stick to just using the tag, but append a timestamp instead of count for snapshots:

 (setq package-build-release-version-functions
       (list #'package-build-get-tag-version))
 (setq package-build-snapshot-version-function
-      #'package-build-release+count-version)
+      #'package-build-release+timestamp-version)

This is 95% done.

(I should be able to finish it in a day or two, but I am very busy with life right now, so probably next week.)

what can I do here to help?

Let's decide which functions to use initially.

I still think we should stick to what I originally implemented and suggested here for the time being at least.

I.e., use (setq release tag) and (setq snapshot count), and maybe later switching to (setq release (or tag header)) (or (setq release (or header tag))). Going in that direction is easier than going the other way (because then there is no risk of having to "increasing" from 0.1.0 to 0.0.0, and because we can delay deciding whether to use (or tag header) or (or header tag).

Also does it seem reasonable to you to deprecate $STABLE and package-build-stable, and to replace them with just $MELPA_FLAVOR. (There is no need for package-build-flavor, I believe.)

melpa / package-build

Implement new version scheme for snapshot channel #81