jart / cosmopolitan

build-once run-anywhere c library
ISC License
18.31k stars 628 forks source link

Add build flag to enable LTO? #243

Open Keithcat1 opened 3 years ago

Keithcat1 commented 3 years ago

Would it be possible to do link-time-optimization on Cosmopolitan and, Redbean and Python? I would think that being able to inline C library functions would increase performance quite a bit for instance.

jart commented 3 years ago

I haven't measured it, but I doubt the Cosmopolitan codebase would benefit from LTO. Isn't that mostly useful for things like large C++ codebases that have lots of getter and setter functions with a single asm opcode? It slows down builds so much. In many cases inlining can be harmful to things like branch prediction. If someone has numbers that show it helps I'll consider it.

One little-known optimization I know for certain is super impactful is profile-guided optimization. Have you considered trying that? It's something I've been meaning to incorporate into our build. It can do things like relocate code and organize branches specifically for your environment. Right now the Cosmopolitan codebase is written in such a way that assumes the critical path is most likely to be Linux. But if you use something like NetBSD then PGO would most likely have an impact edging out a couple extra cycles of performance on things like system calls, for example.

jart commented 3 years ago

One further note. LTO might not be right for Cosmopolitan, but if if it's the right choice for your codebase, then Cosmopolitan shouldn't stand in the way. So please let us know if there's something happening on our end that's preventing you from using LTO for your own code, when you're using Cosmopolitan Libc. In that case I'll reopen, since a Libc should be as flag agnostic as possible!

Keithcat1 commented 3 years ago

No, I was mostly asking about LTO because I wanted to try using it on Python to see if it made it smaller or faster. I don't know too much about how the C compiler optimizes internally though. I don't think Cosmopolitan currently supports profile-guided optimization, at least when using the provided cross-gcc on Windows. Compiling and linking with -fprofile-generate: C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: cannot find -lgcov collect2.exe: error: ld returned 1 exit status

Compiling but not linking with -fprofile-generate: C:\py\cosmo>cosmo hello.o C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: hello.o:(.data+0x40): undefined reference to __gcov_merge_add' C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: hello.o:(.data+0x78): undefined reference togcov_merge_time_profile' C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: hello.o: in function main': C:\py\cosmo/hello.c:2: undefined reference togcov_indirect_call' C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: C:\py\cosmo/hello.c:2: undefined reference to `gcov_indirect_call_profiler_v3 ' C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: C:\py\cosmo/hello.c:2: undefined reference to __gcov_time_profiler_counter' C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: C:\py\cosmo/hello.c:2: undefined reference togcov_time_profiler_counter' C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: hello.o: in function _GLOBAL__sub_I_00100_0_main': hello.c:(.text.startup+0x66): undefined reference togcov_init' C:/py/cosmo/cross-gcc/bin/../lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gn u/bin/ld.exe: hello.o: in function `_GLOBALsub_D_00100_1_main': hello.c:(.text.exit+0x1): undefined reference to `__gcov_exit' c

jart commented 3 years ago

Hmm good point. Python actually might be the sort of thing that could benefit from LTO.

I'll reopen then and give it a try if I have time. The main obstacle is we don't have the lto executable checked-in to the third_party/gcc/ folder. So if you wanted to do exploratory work then you might want to open up build/config.mk and do something similar to what we did with MODE=llvm to tune it to use whatever your local toolchain is. Until I get a chance to experimenting with adding the LTO binary to this repo.

I don't think Cosmopolitan currently supports profile-guided optimization

It does not and we should totally fix that. Having code coverage reports would be a nice added bonus of doing so.

Keithcat1 commented 3 years ago

I tried something like: export CFLAGS=-flto export CXXFLAGS=-flto make -j4 MODE=LLVM -O o/third_party/python But I think it failed to compile an assembly file, I don't have the battery life to check it right now and not pay for it later. What's to stop me using LLVM and -fprofile-generate? LLVM should be able to link its PGO library just fine (assuming it isn't C++).

On 9/4/21, Justine Tunney @.***> wrote:

Reopened #243.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/jart/cosmopolitan/issues/243#event-5256717307

Keithcat1 commented 3 years ago

So I tried adding a new mode andd I got errors like this: gcc-10: error: unrecognized command-line option '--noexecstack' gcc-10: error: unrecognized command-line option '--nocompress-debug-sections'; did you mean --compare-debug-second'?

Looks like the version of GCC that comes bundled with Cosmopolitan is built differently than normal Linux GCC.

gizlu commented 2 years ago

If someone has numbers that show it helps I'll consider it.

sqlite claims that amalgamation (which from optimizer viewpoint should be LTO equivalent), gives them beetween 5 and 10 percent of perf improvement https://sqlite.org/amalgamation.html