Open kavon opened 5 years ago
I've narrowed down the problem to a weird linking issue. Linking even a totally empty shared library with -no-as-needed
adds overhead to the linear_hot
test. I'm wondering if we need to turn -as-needed
back on after linking halomon, or go a different route to ensure halomon is included.
kavon@zeus:~/p/h/test|master⚡*
➤ echo "" > empty.cpp
kavon@zeus:~/p/h/test|master⚡*?
➤ g++ -std=c++14 -fPIC -m64 -shared empty.cpp -o libempty.so
kavon@zeus:~/p/h/test|master⚡*?
➤ g++ -std=c++14 -O3 linear_hot.cpp -Wl,-rpath,. -L. -Wl,-no-as-needed -lempty -o hot-linkall
kavon@zeus:~/p/h/test|master⚡*?
➤ g++ -std=c++14 -O3 linear_hot.cpp -Wl,-rpath,. -L. -lempty -o hot-linkasneeded
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./hot-linkall
7.77user 0.00system 0:07.77elapsed 100%CPU (0avgtext+0avgdata 2636maxresident)k
0inputs+0outputs (0major+109minor)pagefaults 0swaps
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./hot-linkasneeded
7.31user 0.00system 0:07.32elapsed 99%CPU (0avgtext+0avgdata 1224maxresident)k
0inputs+0outputs (0major+56minor)pagefaults 0swaps
Yes the problem is that -no-as-needed
needs to be flipped back off after the -lhalomon
so that all the other stuff doesn't get linked in:
kavon@zeus:~/p/h/test|master⚡*?
➤ ldd hot-linkasneeded
linux-vdso.so.1 (0x00007ffd8a592000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0f12713000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0f12d06000)
kavon@zeus:~/p/h/test|master⚡*?
➤ ldd hot-linkall
linux-vdso.so.1 (0x00007ffe8a5bb000)
libempty.so => ./libempty.so (0x00007f794fe62000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f794fad9000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f794f73b000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f794f523000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f794f132000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7950266000)
I have no idea what's going on anymore. I added -no-as-needed
to the bare version in d0e138e81f620220787ffdba609ae88d25bbdf3f and somehow it got faster.
kavon@zeus:~/p/h/build|master⚡*
➤ ldd ./test/linear_hot-bare
linux-vdso.so.1 (0x00007fff921f6000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fbd52270000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbd51ed2000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fbd51cba000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbd518c9000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbd527fb000)
kavon@zeus:~/p/h/build|master⚡*
➤ time ./test/linear_hot-bare
7.08user 0.00system 0:07.08elapsed 99%CPU (0avgtext+0avgdata 2644maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps
kavon@zeus:~/p/h/build|master⚡*
➤ time ./test/linear_hot
Halo Running!
7.91user 0.01system 0:07.97elapsed 99%CPU (0avgtext+0avgdata 5668maxresident)k
0inputs+0outputs (0major+382minor)pagefaults 0swaps
kavon@zeus:~/p/h/build|master⚡*
➤ ldd ./test/linear_hot
linux-vdso.so.1 (0x00007ffce3034000)
libhalomon.so => /home/kavon/phd/halo/build/monitor/libhalomon.so (0x00007fa5987a3000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa59841a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa59807c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa597e64000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa597a73000)
libpfm.so.4 => /usr/lib/x86_64-linux-gnu/libpfm.so.4 (0x00007fa59768f000)
libboost_system.so.1.65.1 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.65.1 (0x00007fa59748a000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa598bbb000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa59726b000)
If we make halomon.so
a file containing the following linker script:
GROUP ( /path/to/halomon_internal.so )
and compile to halomon_internal.so
it will solve all of our issues here.
As of commit 33f62e521e099c86a73b63de18da59b439662682
One would expect some sort of overhead when halo actually starts up the profiling.
However, the experiments below show that there is some sort of significant associated with just linking
halomon.so
into the executable even when halo's static constructor to start halomon is not run! Note that the-bare
executable is the same thing, but we skip linking in halomon:In the above, halo was launched and received perf events. However, if we comment out the
SystemMonitor
declaration inmonitor.cpp
, then linking in halomon does nothing. Yet here are the results:Interestingly, this large overhead doesn't appear when running the
noop.cpp
test, which is just a main function that immediately returns (in order to test the start-up and shutdown), even when Halo launches!