melpa / package-build

Tools for assembling a package archive
https://github.com/melpa/melpa
25 stars 34 forks source link

Synchronize commit and timestamp information. #61

Closed houseofsuns closed 2 years ago

houseofsuns commented 2 years ago

The functions package-build--get-commit and package-build--get-timestamp are used to determine what snapshot is used for building a package archive. However they use deviating methods to determine this information resulting in conflicting information. Here we transplant the mechanics from package-build--get-timestamp to package-build--get-commit to establish a coherent state.

This is a step towards reproducible distfiles.

This should make no functional difference.

tarsius commented 2 years ago

Well, I did want to fix all these issues in one go, but...

I have ran into some other issues while looking at surrounding code and probably won't merge this into master until I have resolved those too. (And that might take a bit because of the nice whether.)

Enjoy the sun. As long as this does not turn into the perfect being the enemy of the good everything is fine (maybe I'll prod a bit if nothing happens for a while).

This here would be such an issue, but it is a bit more complex than that. See the first commit on the work branch.

And now I am off to feed my mom's cat and then I will spend the night there.

houseofsuns commented 2 years ago

This here would be such an issue, but it is a bit more complex than that. See the first commit on the work branch.

Ah, I see that you used basically the same mechanics to solve the same part of the problem I covered too. But deleted files are indeed tricky.

However I think I found a nice way to solve this: both git and mercurial support handing glob patterns to their log commands. So instead of expanding the files spec in elips it can simply be handed to the version control tool and this will do the right thing (modulo differences in the expansion, but hopefully that will turn out nicely).

With git it seems to be totally straightforward git log '*.rst' will show a log of everything touching any rst file (even inside directories). For git this is somewhat undocumented as I did not find a spot in the documentation where this was explained.

With hg it seems to be a bit more fiddly, here I think the -I / --include option is required. So the invocation would be hg log -I '**.rst' (yes that's two stars for recursive lookup).

tarsius commented 2 years ago

However I think I found a nice way to solve this: both git and mercurial support handing glob patterns to their log commands. So instead of expanding the files spec in elips it can simply be handed to the version control tool and this will do the right thing (modulo differences in the expansion, but hopefully that will turn out nicely).

I like the idea. Of course the crux is how significant the differences are. We will have to test for every recipe and then of course update the documentation.

With git it seems to be totally straightforward git log '*.rst' will show a log of everything touching any rst file (even inside directories). For git this is somewhat undocumented as I did not find a spot in the documentation where this was explained.

Its documented in the gitglossary man page.

Looks like we have to prepend :(glob).

The bigger problem though is how exclude rules are handled. Currently package-build only applies them to earlier include rules, while git first processes all include rules and then applies all exclude rules to everything that was included. I have looked at existing Melpa recipes and almost all that use an exclude rule also place it last. So while this would be a change in behavior, most existing recipes would not be affected.

For two recipes (slime and magit) I moved :exclude last because it makes no difference (see https://github.com/melpa/melpa/pull/8072). After that a single recipe remains that doesn't place :exclude last (it now also is the only one with multiple such elements). I think the order doesn't matter in this case either, but haven't checked carefully.

A bigger problem than individual recipes is that the default spec contains an :exclude entry; individual recipes won't be able to override those!

(defconst package-build-default-files-spec
  '("*.el" "lisp/*.el"
    "dir" "*.info" "*.texi" "*.texinfo"
    "doc/dir" "doc/*.info" "doc/*.texi" "doc/*.texinfo"
    "docs/dir" "docs/*.info" "docs/*.texi" "docs/*.texinfo"
    (:exclude
     ".dir-locals.el" "lisp/.dir-locals.el"
     "test.el" "tests.el" "*-test.el" "*-tests.el"
     "lisp/test.el" "lisp/tests.el" "lisp/*-test.el" "lisp/*-tests.el"))
  "Default value for :files attribute in recipes.")

On the other hand, it is unlikely that a recipe has to override any of these excludes -- unless it's name contains "test" maybe, but that should be fairly rare and it might be okay if such recipes could not derive their files spec from the default.

With hg it seems to be a bit more fiddly, here I think the -I / --include option is required. So the invocation would be hg log -I '**.rst' (yes that's two stars for recursive lookup).

I haven't looked into hg at all yet. It is possible that we cannot get exactly the same behavior as for git, IMO that would be okay but would have to be documented.

houseofsuns commented 2 years ago

With git it seems to be totally straightforward git log '*.rst' will show a log of everything touching any rst file (even inside directories). For git this is somewhat undocumented as I did not find a spot in the documentation where this was explained.

Its documented in the gitglossary man page.

Indeed, however the git-log manpage only mentions "pathspec" once in a rather unrelated place, so no wonder that I did not find it.

Looks like we have to prepend :(glob).

I think this needs a terminating colon :(glob): in the general case, but otherwise looks exactly like the thing we want.

The bigger problem though is how exclude rules are handled. Currently package-build only applies them to earlier include rules, while git first processes all include rules and then applies all exclude rules to everything that was included. I have looked at existing Melpa recipes and almost all that use an exclude rule also place it last. So while this would be a change in behavior, most existing recipes would not be affected.

For two recipes (slime and magit) I moved :exclude last because it makes no difference (see melpa/melpa#8072). After that a single recipe remains that doesn't place :exclude last (it now also is the only one with multiple such elements). I think the order doesn't matter in this case either, but haven't checked carefully.

One remaining recipe sounds like a manageable amount to empirically test and fix if necessary.

A bigger problem than individual recipes is that the default spec contains an :exclude entry; individual recipes won't be able to override those!

(defconst package-build-default-files-spec
  '("*.el" "lisp/*.el"
    "dir" "*.info" "*.texi" "*.texinfo"
    "doc/dir" "doc/*.info" "doc/*.texi" "doc/*.texinfo"
    "docs/dir" "docs/*.info" "docs/*.texi" "docs/*.texinfo"
    (:exclude
     ".dir-locals.el" "lisp/.dir-locals.el"
     "test.el" "tests.el" "*-test.el" "*-tests.el"
     "lisp/test.el" "lisp/tests.el" "lisp/*-test.el" "lisp/*-tests.el"))
  "Default value for :files attribute in recipes.")

On the other hand, it is unlikely that a recipe has to override any of these excludes -- unless it's name contains "test" maybe, but that should be fairly rare and it might be okay if such recipes could not derive their files spec from the default.

Having to roll your own and not be able to inherit from the default value sounds reasonable for the (so far theoretical if I understand correctly) recipes that want to override the excludes.

With hg it seems to be a bit more fiddly, here I think the -I / --include option is required. So the invocation would be hg log -I '**.rst' (yes that's two stars for recursive lookup).

I haven't looked into hg at all yet. It is possible that we cannot get exactly the same behavior as for git, IMO that would be okay but would have to be documented.

I took a look and the invocation hg help patterns supplies the information, that hg has a very similar glob: option that can be prepended to a pathspec. This looks pretty similar to the :(glob): of git, so this seems to justify some optimism.

tarsius commented 2 years ago

I've been busy with other things, but I'll get back to this next week.

tarsius commented 2 years ago

Unfortunately this is cascading out of control a bit, see https://github.com/melpa/package-build/issues/62, and I have a few more urgent things to attend to, so this likely won't get done until next week.

houseofsuns commented 2 years ago

Oh boy, this entails a lot of effort. If I can help with anything specific feel free to ping me, however I feel like I would be more of a hindrance with most of the work due to the intricacies.

tarsius commented 2 years ago

[let git/hg determine the appropriate commit]

This causes the hash to change for 1531 of 5207 packages. That's more than I expected, but test samples indicate this is what we want.

The version changes for only six packages:


("graphql"
 ("5ca5f50b5e6f3883f1138453a356d59a1c002120" . "20180912.31")   ;old
 ("b64ab8d5859a41668f2b67bd962e4a4ae48b8e93" . "20180912.1232"));new
("html-script-src"
 ("66460f8ab1b24656e6f3ce5bd50cff6a81be8422" . "20120403.1815")
 ("ed5e686ab604c81222c7e50b27c5d874c5687db7" . "20130807.918"))
("keepass-mode"
 ("be190a86fd82337fe5280c1833f92d1f9997bced" . "20211030.948")
 ("be190a86fd82337fe5280c1833f92d1f9997bced" . "20211030.1003"))
("live-py-mode"
 ("ff74ae818044c635ec157cda76a2a14c77c9d12e" . "20220518.204")
 ("26d51013e75ddedd5eb8600a0a3dd035319f9d3f" . "20210413.205"))
("ob-fsharp" ; just because of *-pkg.el
 ("0b2fdd9bb4f38af8b5cf4914627af52f5b43d9f7" . "20170618.1429")
 ("0b2fdd9bb4f38af8b5cf4914627af52f5b43d9f7" . "20180217.1731"))
("playonline"
 ("c75da1fdc1dfbd5d9aa274dc4e90ff631ea08e70" . "20200317.642")
 ("463a94fc01112817d1e6e0209ea85385efcb1329" . "20200318.758"))

I checked these six; in each case the change is what we want.

tarsius commented 2 years ago

If you want to help, you could double-check these findings.

Here's a few from the first group:

 ("0x0"
  ("63cd5eccc85e527f28e1acc89502a53245000428" . "20210701.839")
  ("ad9f84e6d39c620da381313b160667864a702fd6" . "20210701.839"))
 ("0xc"
  ("eec4fb10b9288c0852f751cfb05d638664fa2411" . "20201025.2105")
  ("5bd6c0c901d03d1f24a3ddcf3a62d3b6d2428c80" . "20201025.2105"))
 ("a"
  ("93e5ed8c495794d1ba3c04b43041b95ce01079b1" . "20210929.1510")
  ("9ad2d18252b729174fe22ed0b2b7670c88f60c31" . "20210929.1510"))
 ("abgaben"
  ("20d14830f07d66e2a04bcad1498a4a6fbf4b4451" . "20171119.646")
  ("966bfcfdd3b2e288576ffe363d676ad282902090" . "20171119.646"))
 ("abl-mode"
  ("9c8c26c0de493f21cf4a68abad49f944d427bd88" . "20210923.950")
  ("7f692cf9bb263b26fda51bb56a58f6ac61febe3b" . "20210923.950"))
 ("ac-cider"
  ("fa13e067dd9c8c76151c7d140a2803da1d109b84" . "20161006.719")
  ("d8670939bbf88079263d5ace2b8bc04cf325be36" . "20161006.719"))
 ("ac-emacs-eclim"
  ("222ddd48fcf0ee01592dec77c58e0cf3f2ea1100" . "20180911.1121")
  ("edff7e0e30c87036710d88fb0b7a4644750858e8" . "20180911.1121"))
 ("ac-emoji"
  ("40a639764eb654f1b4bb705c817b66032a26ff2b" . "20150823.711")
  ("53677f754929ead403ccde64b714ebb6b8fc808e" . "20150823.711"))
houseofsuns commented 2 years ago

[let git/hg determine the appropriate commit]

This causes the hash to change for 1531 of 5207 packages. That's more than I expected, but test samples indicate this is what we want.

Did you push the code for this to somewhere (I did not find it)? Alternatively, can you dump the complete list with changed hashes somewhere?

tarsius commented 2 years ago

I've just pushed to the fix-snapshots branches of both melpa and package-build. Here's some code I used for testing:

(defun pkgb-compare (old new)
  (cl-mapcan (lambda (a b)
               (and (not (equal a b))
                    (list (list (car a)
                (cadr a)
                (cadr b)))))
         old new))

(defun pkgb-collect-versions (&optional pkgs)
  (mapcar (lambda (name)
        (message "%s..." name)
        (with-demoted-errors "ERROR: %S"
          (let ((rcp (package-recipe-lookup name)))
        (list name (package-build-get-timestamp-version rcp)))))
      (or pkgs (package-recipe-recipes))))

(setq pkg-versions-old (pkgb-collect-versions)) ; using e8c58ec
(setq pkg-versions-new (pkgb-collect-versions)) ; using fix-snapshots
(pkgb-compare pkg-versions-old pkg-versions-new)
(pkgb-compare
 (mapcar (pcase-lambda (`(,n (,_ . ,time))) (list n time)) pkg-versions-old)
 (mapcar (pcase-lambda (`(,n (,_ . ,time))) (list n time)) pkg-versions-new))
tarsius commented 2 years ago

Cloning all repositories would take a day, so here is the requested list of all changes: https://gist.github.com/tarsius/0b51f2964209f93c5de9ddba5bd64dff.

houseofsuns commented 2 years ago

I rolled the dice checked the following 52 random packages (see checked.txt. I observed no unexpected behavior. Most changes where either documentation (e.g. readme) updates with occasional test and ci additions or alternatively unrelated commits due to multiple packages/projects being located in the same repository. Also several times the build system and its recorded dependencies changed which however should not impact the outcome as the relevant elisp-files were unchanged. Some specific remarks:

I'll check some more packages when I find the time.

tarsius commented 2 years ago

I pushed another update. It is no longer necessary to checkout to determine the version, and excludes are now consistently processes after includes. We have no say in the latter because that's how git and hg do it -- but of course that calls into question this hole change and all the work that went into it already.

tarsius commented 2 years ago

I am taking a mental health break of at least a month, taking effect immediately. When I get back, working on this likely won't be a priority. Sorry about this, but I have to take care of myself right now.

milkypostman commented 2 years ago

@tarsius appreciate you letting us know and glad you are taking the time.

houseofsuns commented 2 years ago

As good as it is to see progress in the software realm it's far better to see happy humans that make it happen. Have a good time.

purcell commented 2 years ago

Good job looking after yourself, @tarsius, be well!

tarsius commented 2 years ago

Thanks.

tarsius commented 2 years ago

I am doing better. It helps ignoring the daily requests for help and features for a while.

But I am having more problems putting those things out of my mind that actually interest me. As a result I have been working on a few things the last two weeks.

I'll try to get back to my break soon, but since this is one of the topics that keeps sticking around in the back of my mind, I'll post a follow up issue in a bit and would appreciate your feedback on that. If we could agree on how to proceed, that would help me forget about it until I am ready to actually work on it.

houseofsuns commented 2 years ago

As I see it you posted a draft already containing most of the desired functional changes (in fix-snapshots). If this approach is the one we want to utilize, then here is what remains from my point of view:

This will introduce the following changes:

So this is how I would propose to proceed and hopefully this summary can give you some peace of mind. I will try to get some progress on the submodule thing and investigate the glob syntax change (both of which should be pretty independent of all other points), however I'll be mostly on vacation for the next three weeks.

Feel free to add/comment anything I overlooked, but also feel free to simply abstain for now. :)

tarsius commented 2 years ago

That is mostly right, except:

The determination of the commit is now done through a single git/hg log invocation which means that exclusions are more limited (i.e. only one exclude stanza which must be at the end).

The change in how exclusion works is indeed the crux. But there could theoretically still be multiple excludes and they don't have to appear at the end. (I think. If not, such a change could be reverted).

The main difference is that excludes are processed after all includes, regardless of order. To avoid confusion, the convention should be to put the exclude(s) at the end. We could enforce it and error out otherwise, but that is not strictly necessary.

The file inclusion patterns will now use glob syntax as that is what git/hg expand (instead of the elisp expansion used previously)

Glob syntax was already used before. The git, hg and new elisp implementations should all conform to https://man7.org/linux/man-pages/man7/glob.7.html, and the old elisp implementation did as well.

(The reason that we still need an elisp implementation is that when the git and hg implementations match a directory, they do not output that directory but all files that it contains. That makes sense from their perspective and works for the first step when we only have to determine the last commit that touched any relevant file/directory. But it is not what we need when we copy matched files/directory.)

The default value for file inclusions then has a non-overridable exclude portion. Recipes needing the override will have to specify the entire files configuration value (e.g. by first copying the default value and then augmenting their copy).

We might still come up with alternative approaches such as also providing a :default-includes, but my preference would indeed be that recipes, that need to include files that are excluded by :default, specify all files from scratch. In practice this affects very few packages. I would guess a dozen or less.