Open PoroCYon opened 4 years ago
Mind explaining what exactly this helps with? Size? Performance? I'm not against or anything, just curious what motivated the the increased amount of code...
Size of what, the compiled binary, or the source code? The former shouldn't change Windows, and the extra source code eliminates usage of the libm functions while having a code size very comparable to the Windows version, so it helps with the binary size on Linux a bit.
(glibc libm really likes using IFUNCs, which require extra logic for resolving those, as compared to regular symbols, so not relying on these will hopefully pay off.)
I meant the source code size, yeah. More source code means a higher maintenance burden.
Usually, it's hard to beat an import in terms of size, at least for 64k targets, where you'll likely do a bunch of CRT import anyway. But I guess the IFUNC stuff throws a wrench in that plan. But perhaps using the __builtin_
-variants instead would avoid that?
__builtin_
functions are unfortunately a mixed bag. These emit intrinsics on some architectures, but not on others. Some ISAs have a pow
instruction, but x86_64 doesn't, so my GCC seems to compile it into a call to libm's pow
function. Things like __builtin_clz
and __builtin_popcount
get compiled to their respective instructions on x86_64, but they're emitted as calls to libgcc functions for, say, AVR.
I'm not sure I understand the problem. All we need is exp2 and cos. Both of these seems to behave just fine on both x86 and x64 as far as I can tell (they simply seem to import the libm-functions, and call them normally). What happens for other math functions and for other platforms here is quite irrelevant, no?
Define "behave just fine"? The libm versions are both IFUNCs, and the __builtin_
variants get turned into the libm calls:
test.c
:
#include <stdio.h>
int main(int argc, char* argv[]) {
printf("cos(%d) = %f\n", argc, __builtin_cos/*or exp2*//*with or without f*/(argc));
return 0;
}
$ gcc -Os -o test test.c && ./test a b c d e f
/usr/bin/ld: /tmp/ccMb5edj.o: in function `main':
test.c:(.text.startup+0x8): undefined reference to `exp2'
$ gcc -Os -o test test.c && ./test a b c d e f
/usr/bin/ld: /tmp/ccZuKK2S.o: in function `main':
test.c:(.text.startup+0x8): undefined reference to `cos'
$ # same for exp2f/cosf
$ readelf -Wa /usr/lib/libm.so.6 | grep exp2
135: 0000000000050020 46 IFUNC WEAK DEFAULT 16 exp2f32@@GLIBC_2.27
437: 000000000006eb50 197 FUNC WEAK DEFAULT 16 exp2f128@@GLIBC_2.26
889: 0000000000042930 562 FUNC GLOBAL DEFAULT 16 exp2@@GLIBC_2.29
1081: 0000000000050020 46 IFUNC GLOBAL DEFAULT 16 exp2f@@GLIBC_2.27
1090: 0000000000010450 145 FUNC WEAK DEFAULT 16 exp2l@@GLIBC_2.2.5
[plus some noise cut out]
$ readelf -Wa /usr/lib/libm.so.6 | grep cos
70: 0000000000047cb0 46 IFUNC WEAK DEFAULT 16 cosf32@@GLIBC_2.27
167: 0000000000034c90 77 IFUNC WEAK DEFAULT 16 cosf64@@GLIBC_2.27
594: 0000000000034c90 77 IFUNC WEAK DEFAULT 16 cos@@GLIBC_2.2.5
872: 0000000000047cb0 46 IFUNC WEAK DEFAULT 16 cosf@@GLIBC_2.2.5
876: 000000000001ab70 348 FUNC WEAK DEFAULT 16 cosl@@GLIBC_2.2.5
[plus a lot of noise cut out (cosh, acos, etc)]
I suppose exp2
and cosl
are usable on the version I have installed (2.31 from Arch), but exp2f
(and cosf
) seem to be more annoying.
Define "behave just fine"?
Results in plain function calls to "normal" imported functions. Yes, they are IFUNCs, but I don't see any resolve logic injected into our binary for that. Maybe I'm missing something here?
I guess with this stuff we can avoid having libm in the import-table altogether, which might be a win in some cases. I'm still not convinced it ends up as a net win in a real-world 64k intro, though; those will probably benefit from an import of libm as they are more likely to actually do some more CPU work. But I guess that depends on the intro.
In normal, non-minified binaries, you indeed don't have to include any extra resolving logic, as ld.so does that for you. However, executable packers for Linux such as smol and dnload both have to deal with IFUNCs, and that code does end up in your final executable, because they circumvent a number of ld.so things in order to avoid having to include a symbol table and symbol names (they're hash-based) in the first place.
OK, that makes a bit of sense. But, it kinda seems like a slippery slope, really. Because this can be the case for pretty much any CRT function you're using, so you might have to replace a lot of functions in the end. It seems like it's better to fix this in the packers to me...
Also, just want to make it clear; I'm not trying to throw doubt at this solution, just having a discussion here. It's probably not a bad idea for WaveSabre to not depend on libm.so at all, which I believe this MR solves.
That's true, which is why smol has an option to enable/disable IFUNC support. If it can be left out, that's a few bytes gained.
And indeed, right now the only external symbols I can see that are being used (in Core) are memcpy
and memset
.
Just want to mention an alternative for cos: I did another attempt at dropping the fpuCos-helper in the past, which you can see here.
In the end it didn't work out, because the resulting code actually got slightly larger. Not sure if that's the case on other compilers, though.
This one should be usable as-is, but maybe the pow functions should be reworked a bit to work with LTO (cf. the cos impl.)
Also, do you want #37 to be included here?rebased on top of that one.