landley / mkroot

Simple Linux build, bootable under qemu for multiple architectures.
Other
456 stars 72 forks source link

WIP: add gcc module (help wanted) #3

Closed dongcarl closed 6 years ago

dongcarl commented 6 years ago

Trying to add a gcc module as described here

Invoking ./mkroot.sh -n gcc gives the following stderr and stdout

Are the LFS instructions sound? Do I need to do this in multiple passes or is there a simpler alternative?

landley commented 6 years ago

On 07/03/2018 06:38 AM, Carl Dong wrote:

Trying to add a |gcc| module as described here http://www.linuxfromscratch.org/lfs/view/7.5/chapter05/gcc-pass1.html

Invoking |./mkroot.sh -n gcc| gives the following stderr https://pastebin.com/jbYvSxgn and stdout https://www.dropbox.com/s/8mv0l72v6d4menl/mkroot.stdout?dl=0

Are the LFS instructions sound? Do I need to do this in multiple passes or is there a simpler alternative?

Sorry I haven't had time to work on this, day job's been taking up all my energy.

The mcm-buildall.sh script builds native compilers for each target, as well as cross compilers, using Rich Felker's musl-cross-make project. Prebuilt binaries for them are in the page linked from the README.

The reason I haven't integrated them yet is I'm not building a "make" binary yet. (Without which a compiler is noticeably less useful.) Fixing that is third on my mkroot todo list after replacing the two remaining busybox binaries (route and hush), and integrating mkroot into the toybox build scripts.

Rob

dongcarl commented 6 years ago

@landley I'm trying to build this using gcc simply because I want to produce a reproducible build for the gitian process, which currently uses gcc, I'm guessing using musl-cross-make will not produce the same binaries as gcc will? Also, I believe we're not using musl, but perhaps I can bring that up as a possible new target for future releases of bitcoin.

landley commented 6 years ago

On 07/03/2018 11:53 AM, Carl Dong wrote:

@landley https://github.com/landley I'm trying to build this using gcc simply because I want to produce a reproducible build for the gitian process, which currently uses gcc, I'm guessing using musl-cross-make will not produce the same binaries as gcc will? Also, I believe we're not using musl, but perhaps I can bring that up as a possible new target for future releases of bitcoin.

musl-cross-make is a gcc/binutils build script:

https://github.com/richfelker/musl-cross-make

You can select a few different supported gcc/binutils/musl versions in the Makefile variables at the start of:

https://github.com/richfelker/musl-cross-make/blob/master/Makefile

I've been setting GCC_VER=7.2.0, and sometimes setting MUSL_VER=git-master (a special target that tells it to clone musl's current git repo instead of downloading a tarball). But otherwise leaving the other versions alone.

My mcm-buildall.sh script invokes musl-cross-make repeatedly to build every currently supported target, both cross and native compilers. I do so the same way aboriginal linux used to: first I build an i686 compiler, then I use that to build statically linked i686 binaries for the cross-compiler output, then I use the cross compiler I just built for a target to build a native compiler for that target. This way all the binaries should be reproducible and relocatable.

https://github.com/landley/mkroot/blob/master/mcm-buildall.sh

The prebuilt binaries I referred you to earlier (http://b.zv.io/mcm/bin/) are Zach van Rijn in Philadelphia running my mcm-buildall.sh script and putting the results online.

That's my existing strategy for reproducible toolchain builds: foisting it off on a relevant domain expert. (Rich is pretty good at that part.)

(Longer-term I've been looking at Rich Pennington's https://ellcc.org but his build needs seriously cleanup, and doesn't support nearly as many targets yet.)

Rob

dongcarl commented 6 years ago

@landley I've been experimenting with mcm all day yesterday and have a preliminary module for it (that I can open a PR for as soon as you merge the checkout functionality).

I'm quite new to this cross compiling thing, so I want to validate a few of my observations and assumptions on running mcm-buildall.sh so I don't go down the wrong path...

My observations:

  1. When we have a directory that says $ARCH-linux-musl-cross, that means the gcc under this directory is an executable runnable on whatever architecture the host compiler was (in mcm-buildall.sh's case, i686), that will in turn produce executables runnable on $ARCH
  2. When we have a directory that says $ARCH-linux-musl-native, that means the gcc under this directory is an executable runnable on $ARCH that was produced using $ARCH-linux-musl-native

My mental model of mcm-buildall.sh is that it works like so:

Is the above correct?

Questions:

  1. Are both -cross and -native compilers portable and statically linked? As in, can I copy them to a machine with their runnable architecture and just run them?
  2. For the "i686-linux-musl bootstrap compiler linked against host libc," does this mean that this bootstrap compiler produces musl executables, BUT this compiler itself was compiled using host libc?
  3. Why do we need the "i686-linux-musl bootstrap compiler linked against host libc"? Why not go straight to "i686-linux-musl-cross"?
  4. If I only wanted one tuple (say x86_64), I could change the script to do:
    • Create x86_64-linux-musl bootstrap compiler linked against host libc
    • Create x86_64-linux-musl-cross from parent
      • Create x86_64-linux-musl-native from parent
landley commented 6 years ago

Since the github post is public I'm cc-ing my reply to the toybox mailing list, for reasons explained in the body:

On 07/04/2018 03:38 PM, Carl Dong wrote:

@landley https://github.com/landley I've been experimenting with mcm all day yesterday and have a preliminary module for it (that I can open a PR for as soon as you merge the checkout functionality).

I've been treating the musl-cross-make toolchains (cross and native) as build dependencies of mkroot, I.E. already installed prerequisites.

You seem to want to put the toolchain build back under the mkroot build. That's a design issue we need to work out.

I'm quite new to this cross compiling thing, so I want to validate a few of my observations and assumptions on running |mcm-buildall.sh| so I don't go down the wrong path...

Way back when I wrote an "intro to cross compiling" that really should have been called "why cross compiling sucks", but I was trying to be polite:

http://landley.net/writing/docs/cross-compiling.html

Then I did Aboriginal Linux, with the motto "we cross compile so you don't have to", and wrote a big page of documentation there explaining what it was trying to accomplish:

http://landley.net/aboriginal/about.html

(Before that page, I did training sessions based on https://speakerdeck.com/landley/developing-for-non-x86-targets-using-qemu and if you really want the full context of what I was trying to do I reminisced at http://landley.net/aboriginal/history.html .)

tl;dr the point of Aboriginal Linux was "simplest Linux system capable of rebuilding itself from source code and building Linux From Scratch under the result". I got it down to 7 packages: busybox, uClibc, linux, gcc, binutils, make, and bash. But I did so much work extending busybox to replace the 20+ gnu packages from LFS that I wound up maintaining that project for a bit.

Then I rebased to toybox and musl-libc (and looked for a replacement toolchain for gcc when it went gplv3), but the main design change between aboriginal and mkroot is that aboriginal built its own toolchain and mkroot does not.

By moving the toolchain build out to an external project somebody else maintains, 2/3 of the complexity of aboriginal linux went away, and what was left could be greatly simplified. (I hadn't done so before because nobody who produced cross compilers was willing/able to produce native compilers as well, but Rich Felker was willing to be talked into it when he did mcm.)

Since doing mkroot, I've realized that mkroot doesn't really need to be a standalone project: I can merge the kernel module into the main mkroot.sh file, merge it into the toybox repository, have it build the copy of toybox it's part of, and point to kernel source with a command line argument or an environment variable, so "kernel source" is an environmental prerequisite just like cross compiler toolchain is.

Toybox needs a qemu-based bootable test environment to run root tests in its test suite, automated regression testing on multiple targets is nice, and a builtin simple root filesystem builder in a single file under 1000 lines of shell script isn't a bad thing for toybox to have. Plus my 2013 toybox talk (http://landley.net/talks/celf-2013.txt I.E. http://youtu.be/SGmtP5Lg_t0 ) was about turning AOSP into a self-hosting development environment, and there's AOSP build work to do there (breaking it into orthogonal layers, providing it with a hermetic/reproducible build environment, etc). I designed mkroot with all those goals in mind.

The resulting usage pattern might look something like:

cd ~/dir git clone toybox git clone musl-cross-make git clone linux cd musl-cross-make ../toybox/scripts/mcm-buildall.sh cd ../toybox ln -s ../musl-cross-make/output mcm scripts/cross.sh all scripts/mkroot.sh LINUX=~/dir/linux NATIVE=y

(I'm still waffling on how musl-cross-make specific it should be. The "mcm" symlink isn't an ideal UI. And NATIVE=y implies scripts/mkroot.sh in toybox would also be aware of the mcm symlink and look for native compilers under it, which seems wrong. Really that's more a "cross.sh -n" option setting NATIVE_COMPILER to a path the same way it sets CROSS_COMPILE, and then only cross.sh cares about that symlink. As I said, there's design work to do. :)

However, getting even that far implies that I:

A) add usable versions the two remaining busybox commands (route and sh) to toybox, so I can yank the busybox download. (I'm not merging something into toybox that depends on busybox.)

B) Add a "make" implementation to toybox (or convince musl-cross-make to build it as part of their build, but android builds with LLVM and will never install GPL tools into its image, so I need to write a new make anyway if the kernel build depends on it.)

My limiting factor in all this has been lack of time: $DAYJOB eats all my energy, no big company's wanted to sponsor me, and "take a year off and live off my savings" is less compelling in one's 40s with a 6 figure mortgage and maybe 20 years to retirement than in one's 30s with a 5 figure mortgage and 30 years to retirement.

My observations:

  1. When we have a directory that says $ARCH-linux-musl-cross, that means the gcc under this directory is an executable runnable on whatever architecture the host compiler was (in mcm-buildall.sh's case, i686), that will in turn produce executables runnable on $ARCH

Close: mcm-buildall.sh is actually currently hardwired to i686 host for the cross compilers. (They run faster, it's sort of a poor man's x32.)

It's easy enough to change: two instances of the tuple in the script, plus the i686-host.txt log name tee writes to, then move the new host arch to the start of the list in the for loop at the end.

(I'd make it a variable you can set except for the part about moving the appropriate static/native build to the start of the for loop. Alas the dynamic -host toolchain has some architecture assumptions that easily confuse it, so we do a proper static build with it and then use that for the other architectures. Easy way to do that is built that target first. :)

I've made puppy eyes at Rich about taking mcm-buildall.sh into his musl-cross-make repo (it's not really appropriate for mkroot, and full of exactly the kind of black magic I'm trying to foist off on him anyway), but haven't done so loudly yet. :)

  1. When we have a directory that says $ARCH-linux-musl-native, that means the gcc under this directory is an executable runnable on |$ARCH| that was produced using $ARCH-linux-musl-native

It was produced using $ARCH-linux-musl-cross. It runs on target, and produces binaries for the target. You should be able to extract that tarball on pretty much any system and use it, just like you can with the cross compilers. (In fact i686-linux-cross and i686-linux-native should be pretty similar.

In practice:

$ strace -F ./gcc --sysroot $(readlink -f ..) hello.c 2>&1 | grep stdio.h [pid 29064] read(3, "#include \n\nint main(int"..., 97) = 97 [pid 29064] open("/home/landley/musl-cross-make/bin/i686-linux-musl-native/bin/../lib/gcc/i686-linux-musl/7.2.0/include/stdio.h", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_CLOEXEC|0x200000) = -1 ENOENT (No such file or directory) [pid 29064] stat64("/home/landley/musl-cross-make/bin/i686-linux-musl-native/bin/../lib/gcc/i686-linux-musl/7.2.0/include/stdio.h.gch", 0xff9f9840) = -1 ENOENT (No such file or directory) [pid 29064] open("/home/landley/musl-cross-make/bin/i686-linux-musl-native/bin/../lib/gcc/i686-linux-musl/7.2.0/include/stdio.h", O_RDONLY|O_NOCTTY|O_LARGEFILE) = -1 ENOENT (No such file or directory) [pid 29064] readv(4, [{"#include \n\nint main(int"..., 4095}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024}], 2) = 97 [pid 29064] writev(2, [{"", 0}, {"hello.c:1:10: fatal error: stdio"..., 102}], 2hello.c:1:10: fatal error: stdio.h: No such file or directory

include

Looks like I need to make more puppy eyes at rich. I'm pretty sure this worked at one point, and if I add "-I include" it still does. (By default it's only searching the directory where the compiler headers provided by glibc are installed, not the directory where the libc headers from musl are installed.) And of course the resulting hello world only runs if I --static link it because this isn't a musl host.)

My mental model of |mcm-buildall.sh| is that it works like so:

  • Create i686-linux-musl bootstrap compiler linked against host libc o Create i686-linux-musl-cross from parent
    • Create i686-linux-musl-native from parent
    • Create *-linux-musl-cross from parent

      Create *-linux-musl-native from parent

Is the above correct?

More or less, yes.

Questions:

  1. Are both -cross and -native compilers portable and statically linked? As in, can I copy them to a machine with their runnable architecture and just run them?

Yes, modulo the header search path glitch I just noticed above.

(There's always some weird regressionw ith new gcc versions. This is probably because I'm building 7.2 instead of 6.4. Back in aboriginal linux I had ccwrap.c that parsed the gcc command line and rewrote it starting with --nostdinc --nostdlib and then added back all the search paths manually, because it was the ONLY WAY to beat gcc into submission. Rich has more faith in the gcc developers. Or possibly more patience.)

  1. For the "i686-linux-musl bootstrap compiler linked against host libc," does this mean that this bootstrap compiler produces musl executables, BUT this compiler itself was compiled using host libc?

Yes.

My old rant about the 6 paths and how a compiler is conceptually no different from a docbook to pdf converter was recorded at a conference 10 years ago, at starting almost exactly the 10 minute mark in http://free-electrons.com/pub/video/2008/ols/ols2008-rob-landley-linux-compiler.ogg . (There's probably a written version somewhere but I can't find it just now.)

The GCC developers have been insanely self-important forever, and do stuff terribly. (That's why it's a rant.)

  1. Why do we need the "i686-linux-musl bootstrap compiler linked against host libc"? Why not go straight to "i686-linux-musl-cross"?

There's a reason I refer to it as my "compiler rant". The short answer is "the gcc developers are insane".

  1. If I only wanted one tuple (say x86_64), I could change the script to do:

    • Create x86_64-linux-musl bootstrap compiler linked against host libc o Create x86_64-linux-musl-cross from parent
      • Create x86_64-linux-musl-native from parent

In theory, yes.

(As long as the cross/native pair for the host is the first on you build, it should work. If it's the only one you build, that's the first one. :)

Rob