melpa / package-build

Tools for assembling a package archive
https://github.com/melpa/melpa
25 stars 34 forks source link

Stop populating submodules #62

Closed tarsius closed 2 years ago

tarsius commented 2 years ago

@purcell 13 years ago we started populating submodules for the benefit of yasnippet and five years ago yasnippet stopped tracking the snippets as a submodule. Instead the documentation now instructs users to install the yasnippet-snippets package separately.

I was just trying to build all the packages on Melpa locally. Building stack-mode failed, because its repository contains a module using a git:// url for a repository on Github, but Github stopped supporting that protocol for security reasons a while ago. (And so did we, but we don't check the urls used for modules.) I recently had to ask the maintainers of another mostly abandoned project to update the urls in .gitmodules to deal with the same issue.

IMO we should stop fetching modules altogether. It is a bit of a security issue. We did it for yasnippet, which no longer needs it. If any other package needs this, then I would consider that a mistake and would encourage its maintainers to change that, likely the same way as yasnippet ultimately did it. But I suspect no other package needs it, so we needlessly fetch modules, and some of them can be large and we have no control over it once a package has been added. Someone might add llvm as a module. There are 109 packages with submodules.

Are you okay with removing support for fetching modules? If it turns out some package actually needs a module to be checked out to build properly, then we could restore support but require that the package's recipe specifies that it needs modules to be checked out.

purcell commented 2 years ago

Are you okay with removing support for fetching modules? If it turns out some package actually needs a module to be checked out to build properly, then we could restore support but require that the package's recipe specifies that it needs modules to be checked out.

This would broadly be okay with me, though I like the simplicity of just doing one thing consistently, and having less configuration. For example, what happens in a pre-existing working directory when a package flips from being configured with :submodules t to :submodules nil and then gets rebuilt?

But if there are 109 packages with submodules, I'd guess that something among them will be affected.

tarsius commented 2 years ago

I'm back with data.

But if there are 109 packages with submodules, I'd guess that something among them will be affected.

I have compared the results of both package-build-expand-files-spec and resulting tarballs, with and without the submodules present. Removing the submodules only affects four packages.

I like the simplicity of just doing one thing consistently, and having less configuration.

Agreed -- though I would like to achieve that by removing the feature completely. :grin:

But it's not that important.

If I keep running into issues with modules that no longer cannot be fetched and also don't have to be fetched, then would like to wrap the calls to git submodule with with-demoted-errors though.

(I will open a pull-request to update the protocol used for the module in stack-mode. That module has a module of its own, which also uses git://, so I will have to open a pr for that too. It might take a while, or forever, for those to get merged, as the haven't been touched in years. Actually, I am first going to ask the author if this should be removed.)

For example, what happens in a pre-existing working directory when a package flips from being configured with :submodules t to :submodules nil and then gets rebuilt?

It depends. The same commit that removed :submodules t should also adjust :files accordingly. If files from the submodule were included because of the default files spec, then they would continue to be included, but only if the modules lives at doc, docs or lisp.

My test above implicitly tested this too, so in practice: no package would be affected.


Currently support for submodules is poor, and I have no intentions of improving that. It might get worse over time.

For example, I am currently fixing how we determine the appropriate commit by asking git log/hg log to do it (as suggested here: https://github.com/melpa/package-build/pull/61#issuecomment-1152981374). This won't look inside modules. If :files contains an entry for the directory that is the submodule, then we will build a new release if that changes, but if we only include some of the files from the module, then no release will be build, unless other files from outside the module are also touched, either in the same or a later commit.

milkypostman commented 2 years ago

I can imagine a couple things would happen,

I'd be happy to help resolve issues caused by this change just to get rid of the submodule support. And then just tell packages to either do what package-build does in melpa, or to package them separate which is how things should be anyways.

@tarsius thank you for looking into this.

houseofsuns commented 2 years ago

I filed pull requests for pinyin.el and swift-playground-mode to remove their submodule usage.

tarsius commented 2 years ago

I filed pull requests for pinyin.el and swift-playground-mode to remove their submodule usage.

Thanks a lot! If you could do that for libgit as well, that would be great, but don't feel pressured to do it. I ended up as its maintainer, so ultimately it is my responsibility. Also removing the submodule is probably more complicated in this case.

houseofsuns commented 2 years ago

Thanks a lot! If you could do that for libgit as well, that would be great, but don't feel pressured to do it. I ended up as its maintainer, so ultimately it is my responsibility. Also removing the submodule is probably more complicated in this case.

I'll be on the road the next week, but will take a look afterwards.

tarsius commented 2 years ago

Sounds good.

I think it would be a good idea to use the system libgit2.

tarsius commented 2 years ago

I filed pull requests for pinyin.el and swift-playground-mode to remove their submodule usage.

I should have actually looked at these pull-requests, but because I am trying to take a break, I have not. Since you had not discussed it further, I assumed you would do something along the lines of what I suggested above.

Generally speaking I am opposed to including third-party files in a repository, and when it comes to elisp libraries, I have been pushing back against it for years. The obvious drawback is that the bundled files are not going to be updated when then they are updated upstream. The maintainer of the downstream might have every intention to do it, but in practice it doesn't happen.

This is a big problem when it comes to bundled testing utilities implemented in elisp. For example, until I fixed that, many packages used to bundle ancient copies of ert.el that never got updated but did still end up earlier on the load-path than the heavily improved version that comes with Emacs.

The situation isn't quite the same for data files. IMO in the case of pinyin it would actually be better to keep a copy of pinyin.txt directly in the repository. But only that file, not the whole upstream tree that contains that file. Also there should be a make target or some script, that fetches and checks in updated versions of that file, to increase the odds that actually happens occasionally.

In the case of swift-playground-mode more than one file is required and it is code. But it is code implemented in another language than elisp, so the risk of shadowing the upstream versions of these files, doesn't exist here. Still, copying the upstream repository as-is, is not the right way to go.

What has to happen instead is that the maintainer has to decide whether they agree with my assessment that the upstream project won't see any updates anymore. If that is so, the best course of action would be to absorb the necessary upstream files into the downstream project, and to maintain them there. Probably only the swift files are needed. The vim interface is not, and should not be imported. The documentation would probably have to be merged into the top documentation, the runner script might not be relevant anymore, the license has to be checked for compatibility and it might be possible to remove that file.

An alternative would be to remove the submodule and instead add elisp code that offers to download the swift code (likely by cloning the upstream repository somewhere) at load time. Or maybe the vim and emacs packages should just be maintained in the same repository, along-side the shared swift code. (Such a merge can be done using --allow-unrelated-histories; of course after removing traces of the submodule from the old elisp history. And maybe some other reshuffling is in order after or before the merge.)

In the case of libgit the upstream is far to large and actively maintained for it to make sense to track it directly in the downstream repository. Instead I have decided to remove libgit from Melpa for the time being. There are no packages that use it except for magit-libgit, which is only a proof-of-concept, which I should not have added to Melpa in the first place. Eventually libgit will have to be taught to use the system libgit2 and/or to clone the libgit2 repository at load time, if it isn't present yet. Since nothing really uses libgit at this point, I am not in a hurry to do so.

This leaves us with swift-playground-mode. It only has 406 downloads, which isn't a lot for a package that has been added three years ago. I have decided to remove it from Melpa for the time being, so that I can complete the removal of submodule support. One (okay, two) package(s) is not enough to justify keeping this feature around.

@msanders I would be happy to add swift-playground-mode to Melpa again, if you want to do that. Please don't feel pressured to find a solution asap, but if you want this back on Melpa, then you will have to eventually find a way to not require a submodule. If you want to do that by merging @houseofsuns pr, then that is fine by me too.

tarsius commented 2 years ago

I've now remove magit-libgit, libgit and swift-playground-mode. I hope we can add the latter two back eventually. In the case of libgit that might actually happen earlier than planned because I just saw that there is a pr that does part of the work necessary.

/cc @progfolio @conao3 @jcs090218 @dickmao @adisbladis

tarsius commented 2 years ago

Support in package-build.el is also gone now and that has been merged into the melpa/melpa repository.

houseofsuns commented 2 years ago

I should have actually looked at these pull-requests, but because I am trying to take a break, I have not. Since you had not discussed it further, I assumed you would do something along the lines of what I suggested above.

I actually remembered that there was some such remark, but failed to find it in the moment. But I think that my branch accidentally did use the system libgit2 as I'm somewhat of a novice at packaging and retrying by removing all of the submodule directory did not change the outcome. If you're interested I can take another look as it feels like the same code with different documentation will do the trick.

Generally speaking I am opposed to including third-party files in a repository, and when it comes to elisp libraries, I have been pushing back against it for years. The obvious drawback is that the bundled files are not going to be updated when then they are updated upstream. The maintainer of the downstream might have every intention to do it, but in practice it doesn't happen.

I'm totally with you there. However I think I retrospectively would choose the same approach for pinyin and swift-playground-mode. First reason is my inexperience which makes implementing a more complex solution more error-prone. Second reason is that both packages seemed rather on the inactive side (both submodule references were quite old, proving your point that downstream updates are flaky) so that a complex solution may exceed the bandwidth the respective maintainer is willing to put in. So as always in software it's a tradeoff between the perfect and the pragmatic.