sabotage-linux / sabotage

a radical and experimental distribution based on musl libc and busybox
http://sabo.xyz
Other
542 stars 69 forks source link

Reduce the stage0 bootstrap binary seed to under a megabyte #742

Closed davidar closed 3 months ago

davidar commented 4 months ago

The stage0 bootstrap currently involves building around 45MB of binaries on the host system to initialise the chroot. This requires some degree of trust that the host build tools aren't broken or otherwise compromised. I've been working on reducing this down to a much smaller and simpler binary seed (currently around 600KB) using bootsh. This allows the stage0 build to happen entirely inside a minimal chroot, minimising the influence of the host system and hopefully improving the reproducibility of the bootstrap process.

The steps to run this alternative bootstrap path are:

This is still somewhat experimental, and only supports x86_64 at the moment, but I'm keen to hear any feedback you have.

rofl0r commented 4 months ago

first of all: an interesting and laudable effort.

the existing system for establishing trust in the generated stage1 we currently have in place goes as follows:

for any given sabotage checkout, after you built stage1 and did utils/rebuild-stage1.sh which removes any leftover influences from the host's toolchain (unless it was specifically backdoored to create malicious toolchains) you can run butch checksum on the installed packages and compare the hashes to those obtained from a different host os and the hashes should be identical which either means that all hosts are infected with the exact same backdoor (and revision), or that nothing from the host leaked into the generated binaries.

having bootsh as an additional vector to verify the generated stage1 binaries seems beneficial.

that said, we should definitely keep the existing stage0 build infrastructure around, as it allows bootstrapping all target archs supported by gcc 3.4.6 and 4.2.4, respectively. that would be i386, mips, powerpc, and pre-hardfloat arm. but that's anyway what you seem to have in mind.

there's also the cross-compiling approach which can be used to bootstrap stage1 directly for any target combination supported by musl and mcm, without going through a stage0 build. though if it's possible to use stage0, it's imo more convenient, and faster than to bootstrap a modern cross-compiler toolchain first.

another interesting approach i've seen, is the one used by oasis linux. cproc is used as the bootstrap compiler, which then compiles a patched gcc 4.7.4 (last C version). however cproc only supports a few 64 bit archs at the moment due to its use of the qbe backend.

so it appears that tinycc code is now capable of building musl including complex math and inline asm ?

since you marked this as draft, what further changes do you have in mind? it would certainly be welcome to support more architectures, but then i don't have an idea what tinycc actually supports besides x86.

davidar commented 4 months ago

Thanks!

Yeah, I was intending this as an alternative bootstrap path rather than a replacement, for the reasons you mention.

cproc and oasis are indeed quite interesting. I've been meaning to spend some more time playing with them, especially since the cproc codebase appears rather more approachable than tinycc (I spent a while trying to write a new backend for it but eventually got tired of it feeling more like reverse engineering than coding). The main reason I went with tcc for this is because it seems to have the most comprehensive support for C extensions found in the wild of all the small C compilers, but cproc looks quite promising if things can be patched to strictly follow the standard.

TCC still isn't quite capable of building an unmodified musl - I apply a couple of patches during the bootstrap (https://github.com/davidar/bootsh/blob/v0.1.0/boot.sh#L62-L159). In particular, removing complex number support and the x86_64-optimised math library (just falling back to the pure C implementations). Inline assembly is mostly supported, except for some extended asm in the syscall interface, which I replace with a simpler implementation.

I might look into adding support for i386, I think the main change that will require is just updating the musl bootstrap to not assume x86_64. There's also tcc backends for arm/arm64, but I don't have much experience with them can't say what state they're in. I know the riscv64 backend is being developed quite actively at the moment, mostly due to guix's bootstrapping efforts, but that's not supported by gcc3 anyway (I think the backporting efforts have been mainly focused on gcc4). I have some other ideas for bootstrapping in a less architecture-dependent way, but that's more of a longer term thing.

I mainly put this as a draft since I haven't documented anything yet, and wasn't really sure where the best place to put it would be - I could add a brief description to the readme perhaps? Also it could do with a bit more testing, I've only verified it works on my own machine so far.

rofl0r commented 4 months ago

oh, you added i386, nice. when i last read through your scripts i noticed you're building gawk instead of one of the several way more lightweight alternatives. any particular reason for that ? i'm not very fond of gawk, in fact when i ever see it as hard dependency i patch it out to work to with POSIX awk - an example is dosemu2. another thing i noticed when updating musl to 1.2.5 is that musl >= 1.2.2 no longer builds with gcc3, so i planned to use 1.2.1 (instead of 1.1.24) for stage0 in order to have time64 compatibility. unless we find a way to make fork.c compatible with both gcc3 and tinycc (or find the right patch to make gcc3 do the right thing, as the error it throws seems to be a bug). generally the convention for "tarball-only" packages is to name them "name-tarball". i also pondered whether we can use more of the existing build infrastructure rather than hardcoding all steps in a shell script.

davidar commented 4 months ago

I generally use wak as a lightweight awk implementation, which works for most of the build scripts, but I ran into some compatibility issues with butch itself. I had mistakenly assumed it required gawk since it uses some gawk regex extensions (\s in https://github.com/sabotage-linux/sabotage/blob/master/KEEP/bin/butch-deps#L76). Unfortunately patching that out still results in butch hanging without any specific error, so there may be another compatibility issue on wak's end, I'll see if I can figure out what's going wrong.

I had some issues with time64 in the i386 build of bootsh, so ended up patching it out of the bootstrap musl (prior to stage0). Not sure if it's a tcc issue or just a problem with how I was building it, I could have another look. Otherwise I haven't had any issues building musl 1.2.5 with tcc, other than the patches I mentioned earlier.

Most of the boot-stage0 script is just getting the rootfs setup to be compatible with butch and the host dependencies of stage0, but I'd be happy to push more of that into a butch package if possible. I guess there could be a pre-stage0 package for building binutils, bzip2, curl, and linux-headers? Curl is the only tricky one, as bootsh has only a very simple HTTP-only wget command builtin, so it needs to download curl and bearssl over HTTP (currently using the tar.sabo.xyz mirror which is very handy) to bootstrap HTTPS support. Though I guess the fallback behaviour in butch's download template should handle this automatically?

rofl0r commented 4 months ago

I had mistakenly assumed it required gawk since it uses some gawk regex extensions (\s in https://github.com/sabotage-linux/sabotage/blob/master/KEEP/bin/butch-deps#L76).

ouch, i never noticed, thanks for pointing out.

patching that out still results in butch hanging without any specific error

that sounds like it could be line 23 in butch-core-helper - see also https://github.com/ThomasDickey/original-mawk/issues/41

Though I guess the fallback behaviour in butch's download template should handle this automatically?

it should, but i recall last time it was like stuck forever instead of falling back. gotta figure out how to fix it.

davidar commented 4 months ago

Yep, it seems to be the same input buffering issue, I've raised the issue upstream to see if there's any way to get that working

davidar commented 4 months ago

Marking as draft again as those commits rely on some minor changes to bootsh that I haven't tagged for release yet, will do so once I get the issues with awk resolved

rofl0r commented 4 months ago

nice work. i'm in the process of trying to get gcc3 fixed so it can compile musl 1.2.5, so we can use the same musl-tarball package for all consumers. currently we have 1.1.24 for stage0, 1.2.4 for bootsh and 1.2.5 for stage1, which is a bit of mess. i'll notify you as soon as i know whether we can get this fixed, or not.

rofl0r commented 4 months ago

ok, gcc3 is fixed. is it ok with you if i add the musl-tarball pkg myself and change musl and stage0-musl to use it ?

davidar commented 4 months ago

Yeah, no worries

rofl0r commented 4 months ago

done. i still have a minor nitpick - could we use the existing make package, and run the bootstrap script depending on either an env var in conjunction with STAGE=0 or by detecting the bootsh environment automatically? also i'm unclear about the benefit of adding the sabo.xyz mirror to the mirrors section, when it should be automatically picked up in case of a download error - basically it should work like a hidden additional mirror provided in the mirrors section.

davidar commented 4 months ago

Possibly, the reason I ended up adding another make package was that I was having trouble getting make3 to build without kernel-headers installed (which itself requires make to install). For some reason make4 doesn't seem to have this dependency. Would it be possible to upgrade the main make package? Otherwise I can try to debug the build issues to see if there's a configure option to avoid it or something. Edit: Oh I see, the dependency is being introduced by the sabotage patches, let me see what I can do.

I added the mirror to curl-bearssl as otherwise it was trying to download from tar.sabo.xyz/curl-bearssl and 404ing, but I guess I didn't need to add one to bearssl itself too (edit: fixed).

rofl0r commented 4 months ago

the dependency is being introduced by the sabotage patches, let me see what I can do.

the header refered to provides only 2 macros which we can safely copy instead of including it. i can do that if you like.

// edit: actually, only MAX_ARG_STRLEN is used, this allows to just edit the patch in-place without adjusting hunk offsets

davidar commented 4 months ago

Yeah, that would be helpful. Though I tried compiling without the patches, and TCC is having trouble accurately compiling that version of make for some reason (looking at the debugger it seems to be miscomputing a struct offset somehow, possibly because it involves bitfields but I'm still trying to figure out exactly why, which eventually ends up clobbering the environ pointer and causing make to segfault)

davidar commented 4 months ago

Ok, I figured out why there's a difference between the two versions. Make4 is using posix_spawn, whereas make3 is using vfork in a way that honestly looks like UB according to the posix spec (the child process ends up clobbering some stack space the parent process was using for bookkeeping for the struct offset). Fortunately the workaround seems to be easy enough:

--- a/make-3.82/job.c       2010-07-24 18:27:50.000000000 +1000
+++ b/make-3.82/job.c        2024-05-27 20:42:02.201986913 +1000
@@ -1322,7 +1322,8 @@

 #else  /* !__EMX__ */

-      child->pid = vfork ();
+      pid_t pid = vfork ();
+      child->pid = pid;
       environ = parent_environ;        /* Restore value child may have clobbered.  */
       if (child->pid == 0)
        {
rofl0r commented 4 months ago

nice, you removed the bashisms. that was unexpected - i hope it wasn't a lot of effort. as for nl, we actually have the source in-tree, so in case it isn't needed before tcc is available it could be compiled straight from there - it's a quite recent addition so most sabotage boxes don't have it installed. another thing i noticed that you use which instead of type, the latter being POSIX. i also pondered using [ \t] instead of [[:space:]] since there might be some awk implementation using its own regex, choking on it. as for the curl-bearssl package needing the explicit mirror (because curl-bearssl doesn't exist yet on tar.sabo), it could use the existing curl package with an alternative package provider as explained in the COOKBOOK. currently we usually just switch to a different package when it's used, in the case of curl it would need a "deps.curl.bearssl" section which pulls in bearssl, and the current deps would have to be renamed to "deps.curl.default", plus some DEPS parsing in the build section to see whether to use --with-bearssl or -with-ssl / --with-ca-path. (which makes me wonder why your package doesn't need zlib). these are just some thoughts that occured while reading the diff in the "nice to have" category, i can do all of the above myself if i'm already stretching your patience to the max. apart from that, what's still left to do for me is testing the PR, and cleaning up the commit history to remove fixups and merge commits. the 2 make fixes and the posix awk patch definitely need their own commits, the rest seems to belong into a single commit adding the bootsh bootstrap method. don't worry about doing that on your own if you're not familiar with interactive rebase, since the tiniest mistake could lead to the entire work getting lost. let me hear your thoughts on the above and then i'll do my part to have this merged.

davidar commented 4 months ago

Nah, wasn't too much effort, I'd been meaning to take care of it anyway.

Building nl with tcc would be a bit of a hassle, as it's used by some of the build scripts (bootsh does provide nl as a builtin, but nl also needs to be available on the host system to build it in the first place). Though I guess I could rewrite the build scripts to not depend on it if it's going to be an issue.

You'd probably have better luck implementing the package provider thing than me. You're welcome to squash my commits however you see fit - usually I'd create a separate branch for that but that doesn't work too well with PRs.

Thanks for all your feedback, looking forward to having this merged :)

davidar commented 4 months ago

I finally got around to cleaning up the build system (GNU Make is no longer strictly required, so in theory you should be able to build it on a pure POSIX system now). I also got rid of the nl build dependency, turns out it wasn't particularly important.

rofl0r commented 4 months ago

great news. as you may have noticed the server currently has some issues, that's why i didn't get around to do my testing yet (also working on a side-project of my own atm). also even though make isnt required anymore, let's keep the patches for it anyways. if i can get the server fixed, i'll look into getting this finally polished and merged within the week.

rofl0r commented 3 months ago

finally got this merged. it was quite a bit of effort to clean up the commit history, but worth it. for some reason your make-vfork-ub patch was wrongly formatted and didnt apply, but i fixed that.

i compared the checksums for stage1 packages after /src/utils/rebuild-stage1 with a regular build, and noticed a few differences - butch checksum -d revealed at least in one case that it was due to the bootsh stage1 not utilizing gzip to compress manpages for some reason. could you look into that ?

davidar commented 3 months ago

Awesome! Thanks for your help with cleaning this up.

Hmm, that's interesting, I'll have a look into it (not right now as I'm unwell at the moment, but I'll let you know when I get to it)