Open jiridanek opened 1 year ago
Here's a fairer symbol count before and after -fvisibility=hidden
.
Before:
% nm --dynamic --defined-only router/skrouterd | wc -l
1741
After:
% nm --dynamic --defined-only router/skrouterd | wc -l
70
Regarding --gc-sections
, using with dynamically linked Proton ended up reducing skrouterd size from 4964
kbytes in size to 4956
as reported from du
;P
With static Proton,
% du router/skrouterd 6836 router/skrouterd
% du router/skrouterd
6824 router/skrouterd
And when I put -ffunction-sections -fdata-sections
to Proton compilation options
% du router/skrouterd 6836 router/skrouterd
% du router/skrouterd 6824 router/skrouterd
% du router/skrouterd 6404 router/skrouterd
Phew, finally some decent reduction in size. I put in -fvisibility=hidden
, no mucking with --gc. Looks like hiding symbols makes the binary a bit smaller. We are already doing LTO...
tl;dr: Do we have a serious
-rdynamic
problem?No, because our binaries are not built with
-fPIC
(which would make all calls indirect, going through PLT). The binaries are built with-fPIE
, which does not have this effect.There is a small inefficiency in that all symbols that can be exported are currently being exported due to use of
-rdynamic
, which probably makes the startup and looking up symbols from Python the first time tiny bit slower.Therefore, I propose not to do
-fno-semantic-interposition
, because it has no effect on the generated code. However, I propose doing-fvisibility=hidden
, because we are already marking the symbols that do need to be exported with the explicit visibility annotation. See theQD_EXPORT
macro that does just that.The risk to the above is that we forget to mark something as visible, and we only learn about the mistake at runtime, when Python tries to lookup the symbol and it fails. Hopefully this will be caught by CI, but in the worst case, it can result in a crash in production.
The router tries to do most of its symbol lookups during initialization, but not everything is looked up in that single place
https://github.com/skupperproject/skupper-router/blob/395f98e0875e3a584c30b47dc10df77d442b03b1/python/skupper_router_internal/dispatch.py#L95-L112
Investigation
There are multiple aspects to this. First,
-rdynamic
switch, then the-fno-semantic-interposition
, and-fvisibility=hidden
-rdynamic
Router binary is built with
-rdynamic
which is enabled by doinghttps://github.com/skupperproject/skupper-router/blob/7ef87c10e98ef57705ccbad6557d2ca11ab73f2e/CMakeLists.txt#L44
This is necessary so that Python part of skrouterd can lookup symbols for C functions to call.
There is about 2k symbols being exported like this. On Linux, the executables are ELF files, same as shared libraries, so that's how an executable can work like a bit of both.
And, by the way, some shared libraries are executable, such as
/usr/lib64/ld-linux-x86-64.so.2 --help
.The number of functions marked with
QD_EXPORT
macro is far smaller, CLion IDE says 55. We can limit what gets exported like this using-fvisibility=hidden
, as described in https://stackoverflow.com/questions/37142301/rdynamic-for-select-symbols-only/76671650-fno-semantic-interposition
The flag and what it does is well described in blog http://maskray.me/blog/2021-05-09-fno-semantic-interposition. Another source is this Fedora proposal (where I first learned about it) https://fedoraproject.org/wiki/Releases/32/ChangeSet#Build_Python_with_-fno-semantic-interposition_for_better_performance.
This is useful only for shared libraries, that is, when compiling with
-fPIC
. Router gets built as-FPIE
, which does not use the PLT (Procedure Linkage Table) to turn all calls into indirect ones.With
-fPIC
and with semantic interposition (which is the default on GCC),gets compiled (the call to
f
) aswithout semantic interpostion (with
-fno-semantic-interposition
), we getEven though this does not affect skrouterd directly, and it does not affect Proton (since that is compiled statically), it probably does affect libwebsockets and libunwind, which get compiled into a dll.
Possible further investigation
Therefore, it might make sense to use
-fno-semantic-interposition
for those two dependencies mentioned above.-fvisibility=hidden
To support that, what needs to be exported needs to be annotated explicitly with
__attribute__((visibility("default"))) __attribute__((used))
This is in the codebase for a long time, through the
QD_EXPORT
macro, added inSo far, the correctness (that I got them all) was never tested in production, only by my ad-hoc attempts to
The benefit to doing this is probably tangible, but most likely very small. While the risk is also tangible and small, but the consequences (if we forget to export something that should've been exported, and don't catch that in testing) will be bad. If I were you, I probably wouldn't do it.
Edit: interesting blog about visibility and interposing, https://www.airs.com/blog/archives/307
Additional option,
-Wl,--gc-sections
There is the idea to
-ffunction-sections
,-fdata-sections
,-Wl,--gc-sections
to tell the linker to identify unused functions and not link them.This might be useful to eliminate unused parts of Proton, LWS, and libunwind. Of course, there is the same risk as above, if some object is looked up dynamically, it will be eliminated and then later found missing at runtime, with bad consequences.