Investigate symbols exported from `skrouterd` and find improvement opportunities

jiridanek commented 1 year ago

tl;dr: Do we have a serious `-rdynamic` problem?

No, because our binaries are not built with -fPIC (which would make all calls indirect, going through PLT). The binaries are built with -fPIE, which does not have this effect.

There is a small inefficiency in that all symbols that can be exported are currently being exported due to use of -rdynamic, which probably makes the startup and looking up symbols from Python the first time tiny bit slower.

Therefore, I propose not to do -fno-semantic-interposition, because it has no effect on the generated code. However, I propose doing -fvisibility=hidden, because we are already marking the symbols that do need to be exported with the explicit visibility annotation. See the QD_EXPORT macro that does just that.

The risk to the above is that we forget to mark something as visible, and we only learn about the mistake at runtime, when Python tries to lookup the symbol and it fails. Hopefully this will be caught by CI, but in the worst case, it can result in a crash in production.

The router tries to do most of its symbol lookups during initialization, but not everything is looked up in that single place

https://github.com/skupperproject/skupper-router/blob/395f98e0875e3a584c30b47dc10df77d442b03b1/python/skupper_router_internal/dispatch.py#L95-L112

Investigation

There are multiple aspects to this. First, -rdynamic switch, then the -fno-semantic-interposition, and -fvisibility=hidden

`-rdynamic`

Router binary is built with -rdynamic which is enabled by doing

https://github.com/skupperproject/skupper-router/blob/7ef87c10e98ef57705ccbad6557d2ca11ab73f2e/CMakeLists.txt#L44

This is necessary so that Python part of skrouterd can lookup symbols for C functions to call.

% nm --dynamic router/skrouterd | wc -l
2340

There is about 2k symbols being exported like this. On Linux, the executables are ELF files, same as shared libraries, so that's how an executable can work like a bit of both.

And, by the way, some shared libraries are executable, such as /usr/lib64/ld-linux-x86-64.so.2 --help.

The number of functions marked with QD_EXPORT macro is far smaller, CLion IDE says 55. We can limit what gets exported like this using -fvisibility=hidden, as described in https://stackoverflow.com/questions/37142301/rdynamic-for-select-symbols-only/76671650

`-fno-semantic-interposition`

The flag and what it does is well described in blog http://maskray.me/blog/2021-05-09-fno-semantic-interposition. Another source is this Fedora proposal (where I first learned about it) https://fedoraproject.org/wiki/Releases/32/ChangeSet#Build_Python_with_-fno-semantic-interposition_for_better_performance.

This is useful only for shared libraries, that is, when compiling with -fPIC. Router gets built as -FPIE, which does not use the PLT (Procedure Linkage Table) to turn all calls into indirect ones.

With -fPIC and with semantic interposition (which is the default on GCC),

void f(void){}
void main(void) { f(); }

gets compiled (the call to f) as

call    f()@PLT

without semantic interpostion (with -fno-semantic-interposition), we get

call    f() [clone .localalias]

Even though this does not affect skrouterd directly, and it does not affect Proton (since that is compiled statically), it probably does affect libwebsockets and libunwind, which get compiled into a dll.

Possible further investigation

Therefore, it might make sense to use -fno-semantic-interposition for those two dependencies mentioned above.

`-fvisibility=hidden`

To support that, what needs to be exported needs to be annotated explicitly with __attribute__((visibility("default"))) __attribute__((used))

This is in the codebase for a long time, through the QD_EXPORT macro, added in

https://github.com/skupperproject/skupper-router/pull/82

So far, the correctness (that I got them all) was never tested in production, only by my ad-hoc attempts to

compile the codebase on Windows (where hidden visibility was the default; I guess we still had the shared dll back then?)
compile the codebase as c++, because unless declared extern C, the compiler mangles the symbol name according to c++ rules

The benefit to doing this is probably tangible, but most likely very small. While the risk is also tangible and small, but the consequences (if we forget to export something that should've been exported, and don't catch that in testing) will be bad. If I were you, I probably wouldn't do it.

Edit: interesting blog about visibility and interposing, https://www.airs.com/blog/archives/307

Additional option, `-Wl,--gc-sections`

There is the idea to

compile libwebsockets and libunwind statically, if possible
and then, use -ffunction-sections, -fdata-sections, -Wl,--gc-sections to tell the linker to identify unused functions and not link them.

This might be useful to eliminate unused parts of Proton, LWS, and libunwind. Of course, there is the same risk as above, if some object is looked up dynamically, it will be eliminated and then later found missing at runtime, with bad consequences.

jiridanek commented 1 year ago

Here's a fairer symbol count before and after -fvisibility=hidden.

Before:

% nm --dynamic --defined-only router/skrouterd | wc -l
1741

After:

% nm --dynamic --defined-only router/skrouterd | wc -l
70

jiridanek commented 1 year ago

Regarding --gc-sections, using with dynamically linked Proton ended up reducing skrouterd size from 4964 kbytes in size to 4956 as reported from du ;P

With static Proton,

% du router/skrouterd 6836 router/skrouterd

% du router/skrouterd
6824 router/skrouterd

And when I put -ffunction-sections -fdata-sections to Proton compilation options

% du router/skrouterd 6836 router/skrouterd

% du router/skrouterd 6824 router/skrouterd

one more attempt

% du router/skrouterd 6404 router/skrouterd

Phew, finally some decent reduction in size. I put in -fvisibility=hidden, no mucking with --gc. Looks like hiding symbols makes the binary a bit smaller. We are already doing LTO...

skupperproject / skupper-router