gcc's version of inline assembly for the maths helpers

PoroCYon commented 4 years ago

This one should be usable as-is, but maybe the pow functions should be reworked a bit to work with LTO (cf. the cos impl.)

~~Also, do you want #37 to be included here?~~ rebased on top of that one.

kusma commented 4 years ago

Mind explaining what exactly this helps with? Size? Performance? I'm not against or anything, just curious what motivated the the increased amount of code...

PoroCYon commented 4 years ago

Size of what, the compiled binary, or the source code? The former shouldn't change Windows, and the extra source code eliminates usage of the libm functions while having a code size very comparable to the Windows version, so it helps with the binary size on Linux a bit.

(glibc libm really likes using IFUNCs, which require extra logic for resolving those, as compared to regular symbols, so not relying on these will hopefully pay off.)

kusma commented 4 years ago

I meant the source code size, yeah. More source code means a higher maintenance burden.

Usually, it's hard to beat an import in terms of size, at least for 64k targets, where you'll likely do a bunch of CRT import anyway. But I guess the IFUNC stuff throws a wrench in that plan. But perhaps using the __builtin_-variants instead would avoid that?

PoroCYon commented 4 years ago

__builtin_ functions are unfortunately a mixed bag. These emit intrinsics on some architectures, but not on others. Some ISAs have a pow instruction, but x86_64 doesn't, so my GCC seems to compile it into a call to libm's pow function. Things like __builtin_clz and __builtin_popcount get compiled to their respective instructions on x86_64, but they're emitted as calls to libgcc functions for, say, AVR.

kusma commented 4 years ago

I'm not sure I understand the problem. All we need is exp2 and cos. Both of these seems to behave just fine on both x86 and x64 as far as I can tell (they simply seem to import the libm-functions, and call them normally). What happens for other math functions and for other platforms here is quite irrelevant, no?

PoroCYon commented 4 years ago

Define "behave just fine"? The libm versions are both IFUNCs, and the __builtin_ variants get turned into the libm calls:

test.c:

#include <stdio.h>

int main(int argc, char* argv[]) {
    printf("cos(%d) = %f\n", argc, __builtin_cos/*or exp2*//*with or without f*/(argc));
    return 0;
}

$ gcc -Os -o test test.c && ./test a b c d e f
/usr/bin/ld: /tmp/ccMb5edj.o: in function `main':
test.c:(.text.startup+0x8): undefined reference to `exp2'
$ gcc -Os -o test test.c && ./test a b c d e f
/usr/bin/ld: /tmp/ccZuKK2S.o: in function `main':
test.c:(.text.startup+0x8): undefined reference to `cos'
$ # same for exp2f/cosf

$ readelf -Wa /usr/lib/libm.so.6 | grep exp2
   135: 0000000000050020    46 IFUNC   WEAK   DEFAULT   16 exp2f32@@GLIBC_2.27
   437: 000000000006eb50   197 FUNC    WEAK   DEFAULT   16 exp2f128@@GLIBC_2.26
   889: 0000000000042930   562 FUNC    GLOBAL DEFAULT   16 exp2@@GLIBC_2.29
  1081: 0000000000050020    46 IFUNC   GLOBAL DEFAULT   16 exp2f@@GLIBC_2.27
  1090: 0000000000010450   145 FUNC    WEAK   DEFAULT   16 exp2l@@GLIBC_2.2.5
[plus some noise cut out]
$ readelf -Wa /usr/lib/libm.so.6 | grep cos
    70: 0000000000047cb0    46 IFUNC   WEAK   DEFAULT   16 cosf32@@GLIBC_2.27
   167: 0000000000034c90    77 IFUNC   WEAK   DEFAULT   16 cosf64@@GLIBC_2.27
   594: 0000000000034c90    77 IFUNC   WEAK   DEFAULT   16 cos@@GLIBC_2.2.5
   872: 0000000000047cb0    46 IFUNC   WEAK   DEFAULT   16 cosf@@GLIBC_2.2.5
   876: 000000000001ab70   348 FUNC    WEAK   DEFAULT   16 cosl@@GLIBC_2.2.5
[plus a lot of noise cut out (cosh, acos, etc)]

I suppose exp2 and cosl are usable on the version I have installed (2.31 from Arch), but exp2f (and cosf) seem to be more annoying.

kusma commented 4 years ago

Define "behave just fine"?

Results in plain function calls to "normal" imported functions. Yes, they are IFUNCs, but I don't see any resolve logic injected into our binary for that. Maybe I'm missing something here?

I guess with this stuff we can avoid having libm in the import-table altogether, which might be a win in some cases. I'm still not convinced it ends up as a net win in a real-world 64k intro, though; those will probably benefit from an import of libm as they are more likely to actually do some more CPU work. But I guess that depends on the intro.

PoroCYon commented 4 years ago

In normal, non-minified binaries, you indeed don't have to include any extra resolving logic, as ld.so does that for you. However, executable packers for Linux such as smol and dnload both have to deal with IFUNCs, and that code does end up in your final executable, because they circumvent a number of ld.so things in order to avoid having to include a symbol table and symbol names (they're hash-based) in the first place.

kusma commented 4 years ago

OK, that makes a bit of sense. But, it kinda seems like a slippery slope, really. Because this can be the case for pretty much any CRT function you're using, so you might have to replace a lot of functions in the end. It seems like it's better to fix this in the packers to me...

kusma commented 4 years ago

Also, just want to make it clear; I'm not trying to throw doubt at this solution, just having a discussion here. It's probably not a bad idea for WaveSabre to not depend on libm.so at all, which I believe this MR solves.

PoroCYon commented 4 years ago

That's true, which is why smol has an option to enable/disable IFUNC support. If it can be left out, that's a few bytes gained.

And indeed, right now the only external symbols I can see that are being used (in Core) are memcpy and memset.

kusma commented 4 years ago

Just want to mention an alternative for cos: I did another attempt at dropping the fpuCos-helper in the past, which you can see here.

In the end it didn't work out, because the resulting code actually got slightly larger. Not sure if that's the case on other compilers, though.

logicomacorp / WaveSabre

gcc's version of inline assembly for the maths helpers #51