schierlm / DebianBootstrapSuggestions

MIT License
1 stars 1 forks source link

message from the botch author :) #1

Open josch opened 2 years ago

josch commented 2 years ago

Hi, I was just made aware of this repo by Paul Wise. It seems that you came to the same conclusion as we did 10 years ago when I wrote botch. :)

A analysis of the dependency graph is also generated weekly here: https://bootstrap.debian.net/botch-native/amd64/stats.html

Since then, a lot has happened in Debian with respect to bootstrappability. Most notably we now have the rebootstrap tool by Helmut Grohne which is currently used to bootstrap Debian for new architectures and generally as a QA tool to find out when changes to packages introduce regressions.

Related, there is also efforts to reduce the set of Essential:yes packages: https://wiki.debian.org/Proposals/EssentialOnDiet https://salsa.debian.org/debian/grow-your-ideas/-/issues/20

The biggest blocker for cross-compilation is probably gcc-for-host: https://bugs.debian.org/666743

Feel free to drop by our IRC channel #debian-bootstrap on OFTC or our mailing list debian-cross@lists.debian.org -- whatever your interest in the realm of bootstrappability, we'd love to collaborate. :)

schierlm commented 2 years ago

Thanks for the feedback and thanks for writing botch. To be honest, I was not really sure if I can get botch to output what I want (i.e. treat alternative packages or virtual packages as alternatives and not as "all of them needed"), so I went for the slower route and ran the resolver.

The first link definitely looks great. And it mostly confirmes the "Big ball of mud" findings.

The background of my analyis was (completely unrelated to Debian) to find "evil" packages like GNAT which cannot be compiled without having that exact same package already compiled (possibly for a different architecture), or GNU Autogen (which is partially generating its pregenerated source code using itself). Therefore being a risk for Ken Thompson's Trusting Trust attack. I.e. more about theoretical bootstrappability than actual real-world implications (while fixing them would also be good).

Debian seemed to be a good choice for me, as it is a large distro with lots of tools, and there were already some tools to analyze its dependency graph. Yet the big ball of mud makes it hard to actually find those problematic packages (e.g. dependencies to cdbs or some debian helper tools are totally irrelevant for me).

From my understanding the debian-cross mailinglist focuses more on "real-world" implications when cross-compiling Debian for a new architecture, i.e. GNAT is no problem as you can build a cross-GNAT for the target architecture and then compile GNAT for it. GCC is no problem for us either, since there exist bootstrap paths from GNU Mes as well as from M2-Planet C-compiler to build gcc.

When you are interested in those goals as well, you are also welcome to join in #bootstrappable on Libera.Chat (logs at http://logs.guix.gnu.org/bootstrappable) or on the bootstrappable@freelists.org mailing list (https://www.freelists.org/list/bootstrappable).

josch commented 2 years ago

Thanks for the feedback and thanks for writing botch. To be honest, I was not really sure if I can get botch to output what I want (i.e. treat alternative packages or virtual packages as alternatives and not as "all of them needed"), so I went for the slower route and ran the resolver.

You can get botch to not treat virtual packages and alternatives as "all of them needed" (aka --closuretype) by choosing --strongtype (only consider strong dependencies) or --optgraph (find smallest installation set possible). There is also a tool that uses aspcud to find the most minimal graph size called botch-optuniv.

The first link definitely looks great. And it mostly confirmes the "Big ball of mud" findings.

Yes, that big hairball has been there since the beginning (here my talk about botch at Debconf 2013: http://meetings-archive.debian.net/pub/debian-meetings/2013/debconf13/webm-high/984_An_introduction_to_the_BootstrapBuild_Ordering_Toolchain.webm ) and you probably already know about https://salsa.debian.org/debian-bootstrap-team/botch/-/wikis/home which has a lot more resources about the theory behind dependency graphs in Debian.

We also have a page that keeps track of the size of the "big ball of mud" since 2005: https://bootstrap.debian.net/history.html -- as you can see it keeps growing.

The background of my analyis was (completely unrelated to Debian) to find "evil" packages like GNAT which cannot be compiled without having that exact same package already compiled (possibly for a different architecture), or GNU Autogen (which is partially generating its pregenerated source code using itself). Therefore being a risk for Ken Thompson's Trusting Trust attack. I.e. more about theoretical bootstrappability than actual real-world implications (while fixing them would also be good).

That would be one of the three categories of "self cycles" that botch computes on the stats.html page.

Debian seemed to be a good choice for me, as it is a large distro with lots of tools, and there were already some tools to analyze its dependency graph. Yet the big ball of mud makes it hard to actually find those problematic packages (e.g. dependencies to cdbs or some debian helper tools are totally irrelevant for me).

We've found that having the theoretical foundations done doesn't actually help us much with bootstrapping in practice. The stats.html page shows a lot of points in the graph where dropping a build dependency would greatly reduce the graph size (strong bridges) or where cross compiling a package would do the same (strong articulation points). But more often than not, those build dependencies are hard dependencies and cannot be dropped or the source package cannot be cross-compiled due to the gcc-for-host issue.

From my understanding the debian-cross mailinglist focuses more on "real-world" implications when cross-compiling Debian for a new architecture, i.e. GNAT is no problem as you can build a cross-GNAT for the target architecture and then compile GNAT for it. GCC is no problem for us either, since there exist bootstrap paths from GNU Mes as well as from M2-Planet C-compiler to build gcc.

Yes, most people on IRC or the mailing list I quoted work on making Debian actually bootstrappable. The theory part was done by me 10 years ago and I've honestly felt little motivation to continue with that because I don't see how such work could actually help bootstrapping the actual thing. Which...

When you are interested in those goals as well, you are also welcome to join in #bootstrappable on Libera.Chat (logs at http://logs.guix.gnu.org/bootstrappable) or on the bootstrappable@freelists.org mailing list (https://www.freelists.org/list/bootstrappable).

...does not mean that I'm trying to diss your work. I'm happy that this is what you are having fun with! But that's my own reason why I don't do much more theory stuff anymore, I'm afraid. Anyways, feel free to drop me a mail if you have any botch questions whenever you like or send me a patch if you have some improvements for it. Thanks!

schierlm commented 2 years ago

You can get botch to not treat virtual packages and alternatives as "all of them needed" (aka --closuretype) by choosing --strongtype (only consider strong dependencies) or --optgraph (find smallest installation set possible). There is also a tool that uses aspcud to find the most minimal graph size called botch-optuniv.

I think I played with these options but was not able to replicate the result of apt-get build-dep. But never mind, I got what (I believe) I wanted, and probably it does not matter that much anyway.

We've found that having the theoretical foundations done doesn't actually help us much with bootstrapping in practice.

And I believe there are too many distro-agnostic bootstrapping problems left to make a distro truly bootstrappable. Gnu Guix is trying very hard lately (even by resurrecting and maintaining old versions of packages to make for a working bootstrap build), yet some packages (like GNAT; Autogen or fpc) cheat (by depending on precompiled blobs) in doing so.

those build dependencies are hard dependencies and cannot be dropped

At least not with reasonable effort. There is a way to break the bison -> bison dependency cycle, but it involves heirloom yacc, which is not actually maintained or packaged in any mainstream distro (as far as I know). And Debian decided to "cheat" by just using the precompiled grammars included in the upstream tarball (so you don't even see this dependency in botch's graphs).