halo-project / llvm

Halo's downstream version of LLVM
0 stars 0 forks source link

improperly linking with `-no-as-needed` hurts performance #1

Open kavon opened 5 years ago

kavon commented 5 years ago

As of commit 33f62e521e099c86a73b63de18da59b439662682

One would expect some sort of overhead when halo actually starts up the profiling.

However, the experiments below show that there is some sort of significant associated with just linking halomon.so into the executable even when halo's static constructor to start halomon is not run! Note that the -bare executable is the same thing, but we skip linking in halomon:

➤ time ./test/linear_hot-bare; echo -e "\n-----\n"; time ./test/linear_hot
7.31user 0.00system 0:07.31elapsed 100%CPU (0avgtext+0avgdata 1320maxresident)k
0inputs+0outputs (0major+57minor)pagefaults 0swaps

-----

Halo Running!
7.94user 0.00system 0:07.96elapsed 99%CPU (0avgtext+0avgdata 5696maxresident)k
0inputs+0outputs (0major+382minor)pagefaults 0swaps

In the above, halo was launched and received perf events. However, if we comment out the SystemMonitor declaration in monitor.cpp, then linking in halomon does nothing. Yet here are the results:

➤ time ./test/linear_hot-bare; echo -e "\n-----\n"; time ./test/linear_hot
7.34user 0.00system 0:07.34elapsed 99%CPU (0avgtext+0avgdata 1248maxresident)k
0inputs+0outputs (0major+57minor)pagefaults 0swaps

-----

7.96user 0.00system 0:07.96elapsed 100%CPU (0avgtext+0avgdata 5128maxresident)k
0inputs+0outputs (0major+355minor)pagefaults 0swaps

Interestingly, this large overhead doesn't appear when running the noop.cpp test, which is just a main function that immediately returns (in order to test the start-up and shutdown), even when Halo launches!

0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 1244maxresident)k
0inputs+0outputs (0major+56minor)pagefaults 0swaps

-----

Halo Running!
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 5732maxresident)k
0inputs+0outputs (0major+372minor)pagefaults 0swaps
kavon commented 5 years ago

I've narrowed down the problem to a weird linking issue. Linking even a totally empty shared library with -no-as-needed adds overhead to the linear_hot test. I'm wondering if we need to turn -as-needed back on after linking halomon, or go a different route to ensure halomon is included.

kavon@zeus:~/p/h/test|master⚡*
➤ echo "" > empty.cpp
kavon@zeus:~/p/h/test|master⚡*?
➤ g++ -std=c++14 -fPIC -m64 -shared empty.cpp -o libempty.so
kavon@zeus:~/p/h/test|master⚡*?
➤ g++ -std=c++14 -O3 linear_hot.cpp -Wl,-rpath,. -L. -Wl,-no-as-needed -lempty -o hot-linkall
kavon@zeus:~/p/h/test|master⚡*?
➤ g++ -std=c++14 -O3 linear_hot.cpp -Wl,-rpath,. -L. -lempty -o hot-linkasneeded
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./hot-linkall 
7.77user 0.00system 0:07.77elapsed 100%CPU (0avgtext+0avgdata 2636maxresident)k
0inputs+0outputs (0major+109minor)pagefaults 0swaps
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./hot-linkasneeded
7.31user 0.00system 0:07.32elapsed 99%CPU (0avgtext+0avgdata 1224maxresident)k
0inputs+0outputs (0major+56minor)pagefaults 0swaps
kavon commented 5 years ago

Yes the problem is that -no-as-needed needs to be flipped back off after the -lhalomon so that all the other stuff doesn't get linked in:

kavon@zeus:~/p/h/test|master⚡*?
➤ ldd hot-linkasneeded 
    linux-vdso.so.1 (0x00007ffd8a592000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0f12713000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f0f12d06000)
kavon@zeus:~/p/h/test|master⚡*?
➤ ldd hot-linkall
    linux-vdso.so.1 (0x00007ffe8a5bb000)
    libempty.so => ./libempty.so (0x00007f794fe62000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f794fad9000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f794f73b000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f794f523000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f794f132000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f7950266000)
kavon commented 5 years ago

I have no idea what's going on anymore. I added -no-as-needed to the bare version in d0e138e81f620220787ffdba609ae88d25bbdf3f and somehow it got faster.

kavon@zeus:~/p/h/build|master⚡*
➤ ldd ./test/linear_hot-bare
    linux-vdso.so.1 (0x00007fff921f6000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fbd52270000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbd51ed2000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fbd51cba000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbd518c9000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fbd527fb000)
kavon@zeus:~/p/h/build|master⚡*
➤ time ./test/linear_hot-bare
7.08user 0.00system 0:07.08elapsed 99%CPU (0avgtext+0avgdata 2644maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps
kavon@zeus:~/p/h/build|master⚡*
➤ time ./test/linear_hot
Halo Running!
7.91user 0.01system 0:07.97elapsed 99%CPU (0avgtext+0avgdata 5668maxresident)k
0inputs+0outputs (0major+382minor)pagefaults 0swaps
kavon@zeus:~/p/h/build|master⚡*
➤ ldd ./test/linear_hot
    linux-vdso.so.1 (0x00007ffce3034000)
    libhalomon.so => /home/kavon/phd/halo/build/monitor/libhalomon.so (0x00007fa5987a3000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa59841a000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa59807c000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa597e64000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa597a73000)
    libpfm.so.4 => /usr/lib/x86_64-linux-gnu/libpfm.so.4 (0x00007fa59768f000)
    libboost_system.so.1.65.1 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.65.1 (0x00007fa59748a000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa598bbb000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa59726b000)
kavon commented 5 years ago

If we make halomon.so a file containing the following linker script:

GROUP ( /path/to/halomon_internal.so )

and compile to halomon_internal.so it will solve all of our issues here.