landley / toybox

toybox
http://landley.net/toybox
BSD Zero Clause License
2.43k stars 338 forks source link

[Documentation / Question] Can toybox be combined with e. g. mruby? #492

Open rubyFeedback opened 7 months ago

rubyFeedback commented 7 months ago

This is more a question, but perhaps it could also be added to the main README or a FAQ entry (which could be useful IMO).

Can we combine toybox with e. g. mruby to produce one combined binary?

I am quite ok-ish with ruby code, but C is still over my head (I should have learned C properly first ... now ruby spoiled me).

It would be super-convenient to be able to combine toybox + mruby into one static binary. I could then use this to bootstrap e. g. small linux systems or, if not bootstrap, use tons of ruby scripts that aid in numerous things, written over 20 years or so. Of course mruby could be used separately too, so strictly speaking it is not necessary that toybox is combined with mruby - but I am also curious in this case IF it were possible to combine these two, and if not, what could be done to allow this.

This could also be extended for other binaries, e. g. kind of meta-combining stuff "into" toybox, or use toybox as the general entry point to bootstrap things. That may also include e. g. tying sysinit into toybox for a custom boot setup. Or perhaps grub2 into toybox (I am just spawning ideas at this point in time, but perhaps someone with some C knowledge could answer a few things).

landley commented 7 months ago

Toybox has a small system builder called "mkroot", that's what I'm giving the talk about this weekend:

https://github.com/landley/toybox/tree/master/mkroot

It's a ~400 line bash script that compiles a bootable Linux system: https://github.com/landley/toybox/blob/master/mkroot/mkroot.sh

I've got prebuilt tarballs for over half the architectures QEMU supports: https://landley.net/bin/mkroot/latest/

Grab one and ./run-qemu.sh should boot you to a shell prompt.

It's interactive by default, but here's a bash script that automatically boots each target, has the emulated system run some commands (things like "date" and "wget" to make sure everything's working ok), and parses the results into a simple status report:

https://github.com/landley/toybox/blob/master/mkroot/testroot.sh

No strong reason you couldn't add a ruby build into https://github.com/landley/toybox/tree/master/mkroot/packages if you wanted to add that. (Might fill up the kernel's initramfs capacity and need a different filesystem packaging type though, possibly an extra squashfs like testroot or the C native compilers in https://landley.net/bin/toolchains/latest/ maybe? Happy to discuss a design for that on the project's mailing list...)

landley commented 7 months ago

As for incorporating https://github.com/mruby/mruby into toybox, it's doesn't look impossible but I'm not sure it makes sense? (Vs just packaging it together into the same filesystem.)

Toybox has incorporated external projects before, such as https://git.tukaani.org/xz-embedded becoming https://github.com/landley/toybox/blob/master/toys/pending/xzcat.c (but see also https://github.com/landley/toybox/blob/master/toys/pending/README about the "pending" directory and https://landley.net/toybox/cleanup.html about what it takes to get OUT of the pending directory).

But the limiting factors are 1) does it make design sense, 2) language, 3) license.

Toybox's tar command should support at least extracting the "tar.xz" file format (as explained in https://landley.net/toybox/roadmap.html again sorry your browser can't render plain html legibly), so having an "xzcat" command in toybox itself was appealing (as long as we were going to call out to one anyway). There was one written in plain C and available under a public domain equivalent license.

From a design perspective, you'll notice we collapsed together multiple .c and .h files into a single xzcat.c file for toybox. The result works but needs a lot of cleanup (it doesn't currently sure any of toybox's common lib/*.c code, I estimate that a cleanup process as described on the above page could probably shrink it to less than half its current line count). Note that collating the files like that (even before further work to integrate them with toybox) means keeping up with "upstream" changes has to be done by hand (http://lists.landley.net/pipermail/toybox-landley.net/2024-March/030164.html).

The "plain C" part is also important, toybox is fairly portable C11 code with some gnu extensions but only those well supported by llvm. (A longstanding todo item is looking at other compilers like tinycc.) It has no mandatory external dependencies (like curses or zlib), it can optionally use a few for acceleration purposes (mostly because the Android guys insisted) and has some optional features like selinux support that want to pull in a library, but all of it can be disabled and a version built using built-in implementations of sha3sum the bzip2 decompression and so on.

Code that incorporates any C++, or links to external libraries, would not make design sense to integrate into toybox. Github thinks mruby is mostly C but finds some C++ in there too, I dunno if that's true or not. Also a large chunk of the project is written in ruby (presumably ruby's "standard library"), and I dunno how you'd integrate that into an ELF executable...?

And then there's the license: we use a public domain equivalent license (zero clause BSD, SPDX short name "0BSD") which doesn't even require you to copy specific license text into derived works:

https://en.wikipedia.org/wiki/Public-domain-equivalent_license

The MIT license is not public domain equivalent, we'd need to start a stuttering list if we incorporated code under that: http://www.youtube.com/watch?v=SGmtP5Lg_t0#t=15m09s

So an external fork that sucked in mruby would be fine (and you could legally distribute it under the MIT license), but it's not going upstream into the main project.

oliverkwebb commented 7 months ago

Code that incorporates any C++, or links to external libraries, would not make design sense to integrate into toybox. Github thinks mruby is mostly C but finds some C++ in there too, I dunno if that's true or not.

Incorporating C++ code into vanilla toybox is near impossible from my knowledge. Since we have C++ keywords in our variable names to intentionally break compatibility with C++, You could of course change those and switch to a C++ compiler if you really wanted C++ code in toybox (eww). But there are likely other things that would need to be worked around because C++ is a mess.

enh-google commented 7 months ago

Github thinks mruby is mostly C but finds some C++ in there too, I dunno if that's true or not.

it's just the fuzzer, based on https://llvm.org/docs/LibFuzzer.html which is C++.

the "real" code does not use C++.

landley commented 6 months ago

FYI I took a stab at doing a mkroot/packages/mruby build but mruby needs a build tool called "rake" that's implemented in ruby, and the "minirake" script it has is a ruby script that just does exec "rake", *ARGV

So you can't build mruby without having first installed ruby on the host. The mruby package has ruby as a build dependency.

So then I ran "mkroot/record-commands rake all" and the resulting log.txt only had compiler, assembler, and linker commands, and thought "maybe I can build this with a shell script"... except the assembler commands are all consuming generated files produced by running locally built binaries which aren't called out of the $PATH (and thus not intercepted and recorded in log.txt). I then ran the rake build under strace to see if I could tally up the exec calls, but haven't had the heart to look at that yet. (Do the c file builds pull in host libraries or headers from the apparently required parent ruby install? Dunno...)

oliverkwebb commented 6 months ago

Since we are talking about building programming languages with mkroot packages. And since I've personally been learning lua lately, I took a shot at building a lua interpreter under mkroot:

#!/bin/echo Try "scripts/mkroot.sh lua"

download 83f41abf92620dd15f022e6f863807b07e318495 \
  http://lua.org/ftp/lua-5.4.6.tar.gz

setupfor lua
make MYLDFLAGS=--static CC=cc &&
cp src/lua{,c} "$ROOT/bin"
cleanup

The lua people decided to put CC=gcc --std=gnu99 in the makefile for some reason, when it's all ANSI C anyways. Also this build does require ar, don't know why, but I didn't go out of my way to take that out of the makefile.

absolutelynothinghere commented 6 months ago

when it's all ANSI C anyways.

Not exactly. Lua can be compiled as ANSI C, however that is not recommended. As per src/Makefile:

    @echo '*** C89 does not guarantee 64-bit integers for Lua.'
    @echo '*** Make sure to compile all external Lua libraries'
    @echo '*** with LUA_USE_C89 to ensure consistency'

Also this build does require ar

It seems to be needed only for liblua.a, which contains the object files linked into both lua and luac... It should be trivial to skip the creation of liblua.a and link the executables to the object files directly.

oliverkwebb commented 6 months ago

when it's all ANSI C anyways.

Not exactly. Lua can be compiled as ANSI C, however that is not recommended. As per src/Makefile:

  @echo '*** C89 does not guarantee 64-bit integers for Lua.'
  @echo '*** Make sure to compile all external Lua libraries'
  @echo '*** with LUA_USE_C89 to ensure consistency'

Looking through the code, it also effects valid strftime sequences, and max integers, along with some other config stuff. But other than that, it's made for plain C89.

The dependence on C89 actually becomes mildly annoying, because there is no way to do things like stat a file without calling external C code (From my understanding, this is the reason toybox wasn't written in Lua).

The toolchains we care to support use C99 (I think toybox actually moved to C11 some time ago) and POSIX (Plus some compiler and library extensions from lib/gcc).

Also this build does require ar

It seems to be needed only for liblua.a, which contains the object files linked into both lua and luac... It should be trivial to skip the creation of liblua.a and link the executables to the object files directly.

The way I removed it in http://lists.landley.net/pipermail/toybox-landley.net/2024-April/030311.html was: sed -i 's/^LUA_A=.*/LUA_A=$(BASE_O)/; s/$(LUA_A):/notran:/' src/Makefile. Just changing the Makefile rules to not do that.

Building a .a file for lua is understandable since it's a language designed to be embedded, But the Lua people didn't need to make that necessary.

landley commented 6 months ago

I cycled back to this issue and asked over at https://github.com/mruby/mruby/issues/6258 if mruby has a microperl equivalent where it can build just enough of itself to run a build written in itself.

(That links to the 2018 issue google coughed up where people asked about this, and were told reviving the Makefile build was no-go.)