rmyorston / busybox-w32

WIN32 native port of BusyBox.
https://frippery.org/busybox
Other
670 stars 124 forks source link

[Feature Request] support directly execution of αcτµαlly pδrταblε εxεcµταblε (Actually Portable Executable, Cosmopolitan, APE, cosmos) executable file format #424

Closed shunf4 closed 2 months ago

shunf4 commented 2 months ago

αcτµαlly pδrταblε εxεcµταblε (Actually Portable Executable, APE) (https://justine.lol/ape.html) executable file format, along with Cosmopolitan C libc library (https://justine.lol/cosmopolitan/), is cool.

A quick patch to make busybox-w32 support directly executing APE format executables, without changing their file extension to .exe:

diff --git a/win32/mingw.c b/win32/mingw.c
index d3daf8315..152d2b25d 100644
--- a/win32/mingw.c
+++ b/win32/mingw.c
@@ -468,6 +468,12 @@ static int has_exec_format(const char *name)
        return 1;
    }

+   // Actually Portable Executable
+   // https://github.com/jart/cosmopolitan/blob/f9dd5683a4de7e7c69fa54fe85a01a935ab3865b/ape/ape.S#L120
+   if (n >= 9 && memcmp(buf, "MZqFpD='\n", 9) == 0) {
+       return 1;
+   }
+
    /*
     * Poke about in file to see if it's a PE binary.  I've just copied
     * the magic from the file command.

To test:

Download pre-built binaries from superconfigure GitHub repository's release page, or from https://cosmo.zip/pub/cosmos/bin/ .

Then try to execute one — like, bash — in busybox-w32 ash:

~/Portable/cosmo/superconfigure-built $ head -c 64 ./bin/bash | xxd
00000000: 4d5a 7146 7044 3d27 0a0a 0010 00f8 0000  MZqFpD='........
00000010: 0000 0000 0001 0008 4000 0000 0000 0000  ........@.......
00000020: 0000 0000 0000 0000 2720 3c3c 276a 7573  ........' <<'jus
00000030: 7469 6e65 316a 6774 3072 270a c80c 0100  tine1jgt0r'.....
~/Portable/cosmo/superconfigure-built $ ./bin/bash --version
GNU bash, version 5.2.0(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
rmyorston commented 2 months ago

Sure, why not? It's sufficiently amusing to be worth the cost.

~ $ ls -l true busybox.exe
-rwxrwxr-x    1 rmy      rmy         665600 Jun 16 09:05 busybox.exe
-rwxrwxr-x    1 rmy      rmy         680635 Jun 16 07:51 true
~ $ ./true; echo $?
0
~ $ ./busybox true; echo $?
0

Seems to work. Though if all you want is a Windows version of true it's more efficient to download busybox-w32 ;-)

Available in the prerelease binaries (PRE-5375 or above).

avih commented 2 months ago
// https://github.com/jart/cosmopolitan/blob/f9dd5683a4de7e7c69fa54fe85a01a935ab3865b/ape/ape.S#L120

/* Actually Portable Executable */ /* See ape/ape.S at https://github.com/jart/cosmopolitan */

I don't know if these are enough to decide that this string is the APE magic also in future releases. I.e. that it's intended to keep this prefix for the forseeable future. It would have been better if it was documented someplace how it should be identified. Though in practice I think it's OK. Definitely at least for now.

(regardless, I also bumped into this issue of busybox sh failing to invoke extension-less APE binaries, but didn't consider it important enough to spend time on)

As for the implementation details, I think the original suggestion is cleaner, even if it costs two more bytes for the MZ comparison (does it?), because commit 51a4aaa makes it harder to update the file code, if needed, though it's probably unlikely to get changed, ever (again).

shunf4 commented 2 months ago

I don't know if these are enough to decide that this string is the APE magic also in future releases. I.e. that it's intended to keep this prefix for the forseeable future.

From https://justine.lol/ape.html :

You'll notice that execution starts off by treating the Windows PE header as though it were code. For example, the ASCII string "MZqFpD" decodes as pop %r10 ; jno 0x4a ; jo 0x4a and the string "\177ELF" decodes as jg 0x47. It then hops through a mov statement which tells us the program is being run from userspace rather than being booted, and then hops to the entrypoint.

As long as APE wants to support (MBR) BIOS bootup as well as raw VM execution, the magic is hard to change because it's ... pure magic.

Also Cosmopolitan C's README recommends registering APE magic into binfmt_misc as an one-time effort.

https://github.com/jart/cosmopolitan/blob/8e3b361aeba12c47030011bac45badf93eadf611/README.md?plain=1#L197-L217

avih commented 2 months ago

From https://justine.lol/ape.html : ... the ASCII string "MZqFpD" decodes as ...

Well, that's shorter than the string which the code searches. by two chars: = and newline.

So should it be kept as is? or shorten to MZqFpD ?

rmyorston commented 2 months ago

As for the implementation details, I think the original suggestion is cleaner, even if it costs two more bytes for the MZ comparison (does it?)

What was committed saves 16-32 bytes compared to the original suggestion.

Well, that's shorter than the string which the code searches. by two chars: = and newline.

Three characters: there's a single quote in there too. If six characters is enough for binfmt_misc it's probably good enough for us. This saves a further 24-32 bytes:

diff --git a/win32/mingw.c b/win32/mingw.c
index 26b046f1a..39e2b86f7 100644
--- a/win32/mingw.c
+++ b/win32/mingw.c
@@ -475,7 +475,8 @@ static int has_exec_format(const char *name)
    if (buf[0] == 'M' && buf[1] == 'Z') {
        /* Actually Portable Executable */
        /* See ape/ape.S at https://github.com/jart/cosmopolitan */
-       if (n > 9 && memcmp(buf + 2, "qFpD='\n", 7) == 0)
+       offset = (buf[2] << 24) + (buf[3] << 16) + (buf[4] << 8) + buf[5];
+       if (n > 6 && offset == 0x71467044)  // "qFpD"
            return 1;

        if (n > 0x3f) {
avih commented 2 months ago
+       offset = (buf[2] << 24) + (buf[3] << 16) + (buf[4] << 8) + buf[5];
+       if (n > 6 && offset == 0x71467044)  // "qFpD"

Note that this works because q happens to have the 0x80 bit unset (unsigned char is promoted to int, which is 32 bit on windows, and if it had the 0x80 bit set then the result would be integer overflow, which I believe is undefined behavior or implementation-defined, and would likely make it negative, but not sure).

EDIT: it's indeed fine, but if 0x80 was set it would have been UB. From C99 spec 6.5.7.4:

The result of E1 << E2 is E1 left-shifted E2 bit positions... If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo... If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

For completeness, I'd cast buf[2] to unsigned before the 24 bit shift.

rmyorston commented 2 months ago

Surely, not fine? The top bit of a literal 'q' certainly won't be set, but the top bit of buf[2] might. Casting buf[2] is required. There's another similar calculation later.

diff --git a/win32/mingw.c b/win32/mingw.c
index 26b046f1a..98254fdbe 100644
--- a/win32/mingw.c
+++ b/win32/mingw.c
@@ -473,16 +473,20 @@ static int has_exec_format(const char *name)
     * the magic from the file command.
     */
    if (buf[0] == 'M' && buf[1] == 'Z') {
+/* Convert four unsigned bytes to an unsigned int (little-endian) */
+#define LE4(b, o) (((unsigned)b[o+3] << 24) + (b[o+2] << 16) + \
+                       (b[o+1] << 8) + b[o])
+
        /* Actually Portable Executable */
        /* See ape/ape.S at https://github.com/jart/cosmopolitan */
-       if (n > 9 && memcmp(buf + 2, "qFpD='\n", 7) == 0)
+       const unsigned char *qFpD = (unsigned char *)"qFpD";
+       if (n > 6 && LE4(buf, 2) == LE4(qFpD, 0))
            return 1;

        if (n > 0x3f) {
            offset = (buf[0x19] << 8) + buf[0x18];
            if (offset > 0x3f) {
-               offset = (buf[0x3f] << 24) + (buf[0x3e] << 16) +
-                           (buf[0x3d] << 8) + buf[0x3c];
+               offset = LE4(buf, 0x3c);
                if (offset < sizeof(buf)-100) {
                    if (memcmp(buf+offset, "PE\0\0", 4) == 0) {
                        sig = (buf[offset+25] << 8) + buf[offset+24];

This still saves 24-32 bytes and should avoid undefined behaviour.

Previously I was using a big-endian unsigned int so the ASCII codes in the hex constant could be read left-to-right. Since the other calculation requires the bytes to be read as little-endian I've standardised on that. The macro can be used to generate the constant from a string literal.

Using an integer comparison for memcmp(buf+offset, "PE\0\0", 4) increases bloat. Presumably because in that case the offset isn't a constant.

avih commented 2 months ago

Surely, not fine? The top bit of a literal 'q' certainly won't be set, but the top bit of buf[2] might. Casting buf[2] is required.

Indeed, I somehow felt it's the literal q which is being shifted, but obviously that's not the case.

Previously I was using a big-endian unsigned int ... Using an integer comparison for memcmp(buf+offset, "PE\0\0", 4) increases bloat.

My take is that I wouldn't care about bloat at that level. This can be down to compiler optimizations of the specific compiler you tested with etc.

Especially because, for me, at least for windows code which doesn't go upstream, the priority should be code clarity and efficiency where needed, which should typically get the code near the lower bound, though not necessarily the absolute lowest.

Because Windows is not a router. We don't care all that much if the application is 600k or 6M (though obviously we'd take the former over the latter if given the choice), and code clarity makes it much more hackable - quite unlike upstream, and for me the ability to return to the code later is more important than saving few bytes.

This is subjective obviously, and doesn't matter too much in this case, but I'd just use memcmp. It's simple, clear, works, and without hidden gotchas.

rmyorston commented 2 months ago

I've pushed my most recent patch. There are prereleases (PRE-5377 or above).

avih commented 1 month ago

I'm not entirely sure how many bytes the current code checks, but I think it's 6, at which case that would be 2 less than at the offical spec which was published recently - https://github.com/jart/cosmopolitan/blob/master/ape/specification.md .

A 9th byte newline is recommended to exist at the file, but not required.

rmyorston commented 1 month ago

Yes, it's six bytes. That's how many the Cosmopolitan README suggests are supplied to binfmt_misc to identify APE programs on Linux.

Using six rather than eight increases the possibility of a false positive. But the consequences aren't that severe: some random file might be considered executable and a user might try to run it. If it is executable it'll run, if not it won't.

Even checking eight bytes might still result in a false positive if you're very unlucky.