lpereira / lwan

Experimental, scalable, high performance HTTP server
https://lwan.ws
GNU General Public License v2.0
5.92k stars 549 forks source link

Linker error when using different C library #155

Open fischermario opened 8 years ago

fischermario commented 8 years ago

Embedded platforms usually use different libraries than glibc to lower the memory footprint. For example OpenWRT uses uClibc, LEDE uses musl and there are some more.

When I tried to build Lwan for OpenWRT and LEDE I ran into the following linker error:

Linking C executable lwan (...) /tmp/cc6UKH83.ltrans1.ltrans.o: In function coro_entry_point.5491': cc6UKH83.ltrans1.o:(.text+0xa04): undefined reference toswapcontext' /tmp/cc6UKH83.ltrans15.ltrans.o: In function lwan_write.part.0.5897': cc6UKH83.ltrans15.o:(.text+0xa78): undefined reference toswapcontext' /tmp/cc6UKH83.ltrans0.ltrans.o: In function lwan_process_request': cc6UKH83.ltrans0.o:(.text+0xe24): undefined reference toswapcontext' cc6UKH83.ltrans0.o:(.text+0xe5c): undefined reference to swapcontext' /tmp/cc6UKH83.ltrans2.ltrans.o: In functioncoro_yield': cc6UKH83.ltrans2.o:(.text+0x18): undefined reference to swapcontext' /tmp/cc6UKH83.ltrans2.ltrans.o:cc6UKH83.ltrans2.o:(.text+0xe4): more undefined references toswapcontext' follow /tmp/cc6UKH83.ltrans2.ltrans.o: In function coro_reset': cc6UKH83.ltrans2.o:(.text+0x10c8): undefined reference togetcontext' cc6UKH83.ltrans2.o:(.text+0x1108): undefined reference to makecontext' /tmp/cc6UKH83.ltrans7.ltrans.o: In functionserve_files_handle_cb.7749': cc6UKH83.ltrans7.o:(.text+0x9a8): undefined reference to swapcontext' /tmp/cc6UKH83.ltrans7.ltrans.o: In functioncache_coro_get_and_ref_entry': cc6UKH83.ltrans7.o:(.text+0xb78): undefined reference to swapcontext' /tmp/cc6UKH83.ltrans8.ltrans.o: In functionapply_until.9226': cc6UKH83.ltrans8.o:(.text+0x3d0): undefined reference to swapcontext' cc6UKH83.ltrans8.o:(.text+0x4c0): undefined reference toswapcontext' cc6UKH83.ltrans8.o:(.text+0x604): undefined reference to swapcontext' /tmp/cc6UKH83.ltrans8.ltrans.o:cc6UKH83.ltrans8.o:(.text+0xf50): more undefined references toswapcontext' follow collect2: error: ld returned 1 exit status

I can confirm this at least when linking against uClibc and musl. The problem is that both do not seem to implement the functions defined in ucontext.h (Reference: musl Wiki).

These functions, if I understand correctly, are not (or no more?) part of POSIX (Reference). This problem seems to be affecting others too (Github bugreport).

One way to go could be to implement the necessary functions as part of Lwan for all major platforms (right now the inline definitions in lwan-coro.c only cover x86_64 and i386) without relying on an external library. A good starting point for that (as mentioned in the bugreport in the link above) seems to be this project: libtask

Another way could be to drop ucontext.h and replace it entirely with functionality from pthreads. There is a discussion on Stack Overflow here about the topic.

Unfortunately this time I do not have a simple solution. I would rather want to discuss the problem first before writing a patch.

lpereira commented 8 years ago

Yes, I'm aware that ucontext.h is deprecated from POSIX; which is weird, because they still depend on that to have a machine context in case of, say, segmentation faults. I remember that it's optional in uClibc (at least for the i386 version, which I've used before to run Lwan on an Intel Galileo board; that's why there's an inline version for that architecture).

I'm not comfortable with ARM assembly, so I didn't do what I did for the inline assembly versions in Lwan. These were ripped from a Glibc disassembly, and I've removed stuff that wasn't required for the coroutines.

The pthreads version won't work, as that works with a number of threads, rather than coroutines. Instead, a better fallback than writing inline assembly for various platforms might be using _setjmp() and _longjmp() (using these is preferable over sigsetjmp() and siglongjmp() or their non "underlined" versions, to avoid system calls when switching coroutine contexts).

If performance on these platforms end up being a problem, and coroutine context switch ends up being the bottleneck, we can write the inline assembly version for that.

fischermario commented 8 years ago

Thanks for the insight. In the meantime I have found out that libtask is not a suitable library to solve the problem (incomplete and uses inline assembly only).

It took me the last couple of days to evaluate other solutions. The most suitable library (in my eyes) seems to be libcoro. It employs different approaches to implement coroutines. During compile-time it can automatically select between using get/set/swap/makecontext or setjmp/longjmp or inline assembly or pthreads depending on what is available. It is also possible to force a specific method by using compiler flags.

There is a project called libwire that uses libcoro together with epoll in almost the same fashion as you do. I would now change lwan-coro.c to use libcoro in the same way that libwire does (although I guess this will only require a few changes because you already did the heavy lifting by isolating the coro interface from the system functions).

Now my proposal: If you find this acceptable I will make the changes, test them and of course present the patch here. I would also want to do some benchmarking. Do you have some kind of a testsuite to measure coroutine performance in lwan?

lpereira commented 8 years ago

If at all possible, I'd like to avoid having external dependencies in Lwan.

One of the things that are possible to do is port makecontext() and swapcontext() from glibc. I just took a look in the ARM implementation for swapcontext() and it's easy to copy it over to Lwan like it was done with the x86 versions (you'll need to copy over __setcontext() and __getcontext() as well, but they're pretty straightforward). It might be easier to just disassemble glibc and copy the disassembled code, as the source code contains a few macros that expand to fields in a structure that represents the machine context. I'm not well versed in ARM assembly or the ARM ABI for Linux (not even sure if that's different for other OSes as well), so I'd have to read up on that to help you here.

libcoro can be used as a GPLv2 program as well, so porting over their context switch routines (and giving appropriate credits, of course) is also an option. I worry that the comment basically says it's untested on ARM.

There's libmill and libdill as well, from the same author, that uses sigsetjmp() and siglongjmp() to switch coroutine contexts. They don't do the same kind of unportable trickery that libcoro does to set the stack and instruction pointers in the jmp_buf array, so it might also be an option to get inspired by to write a fallback mechanism.

I used to have coroutine performance benchmarks but I've lost them a few years back after a hard drive crash. It's not hard to write a benchmark, though; creating a simple coroutine that basically yields a few times and hooking that up to Google Benchmark will work just fine.

lpereira commented 8 years ago

It might be worth looking into libco. It now has ARM support, the code is quite tiny, and it has a compatible license (the ARM code in particular is in the public domain).

lpereira commented 8 years ago

I'm at home now, and I got a RPi2. I'll see what I can do to make Lwan work on ARM without relying on ucontext.

fischermario commented 8 years ago

Thanks again for your reply. On one hand I can understand that you want to avoid external dependencies, but on the other I think that inline assembly is not the solution for this issue: For example ARM has AArch32 (32-bit) and AArch64 (64-bit), which are different. Then there is the question of NEON (SIMD) being available or not. This gets messy very easily. From a larger perspective there are many embedded platforms that one might want to run lwan on. Today someone wants to run lwan on RPi2 (ARM, AArch32), but tomorrow someone else may have a MIPS-based device (popular on routers, STBs, ...) and day after tomorrow there is someone with a PowerPC device. The list could go on. What I want to point out is that I liked the approach of libcoro because it offered more than one fallback option. Even if there was no inline assembly available for a given platform, there was an option to use setjmp.h functions and even if that failed it offered to rely on pthreads (although that would be very slow). The fact that this was completely transparent for the API made this library a suitable choice from my point of view (although I get your point about some parts being untested).

I have looked into libmill and libdill. libdill is described as a follow-up project by the author. It relies on a setjmp/longjmp-scheme. Right now I am trying to adapt lwan to use libdill and to measure the performance (thanks for the tip with Google Benchmark, I will give it a try). Unfortunately libdill has no fallback if setjmp/longjmp is not available (e.g. with musl there is a warning sign in their wiki) and libdill cannot leverage inline assembly if there was a chance to use it.

lpereira commented 8 years ago

If libcoro is used only for the fallback cases instead of the deprecated ucontext stuff, I wouldn't mind having it as an optional dependency.