lpereira / lwan

Experimental, scalable, high performance HTTP server
https://lwan.ws
GNU General Public License v2.0
5.92k stars 549 forks source link

Coroutines Memory Pool #299

Closed jojo0severo closed 3 years ago

jojo0severo commented 3 years ago

So, i'm currently trying to use your coroutines as a standalone, extracting the code was not the problem but the attempts to optmize it were.

I built a memory pool to pre allocate memory for new coroutines Replacing malloc and free on:

coro_new: struct coro *coro = malloc(sizeof(*coro) + CORO_STACK_MIN); by struct coro *coro = alloc_mem(pool);

coro_free: free(coro) by free_mem(pool, coro)

Pool is correctly implemented as the problems only happen when using it with the coroutines.

Since I'm not using the defer function, is there any other location where i should replace malloc and free?

Thanks for any reply.

Coroutines with memory pool: https://github.com/jojo0severo/lwan-coro-with-mempool

lpereira commented 3 years ago

Is the memory allocation for a coroutine struct+stack really worth optimizing? At least in Lwan, it didn't matter when I tried.

I took a look at your mempool implementation, and it does locking and whatnot -- might be a point of contention in multithreaded scenarios. You might want to try a different approach; for instance, using a thread-local ring buffer of pointers to memory (Lwan has an implementation you can use), where the algorithm would be the following pseudo-code:

pool = thread_local(mempool_new(sizeof(struct coro) + coro_stack_size)))

def pool_get(pool):
    if pool.ring.is_empty():
       return malloc(pool.size)
    return pool.ring.get()

def pool_put(pool, mem):
    if pool.ring.is_full():
       free(mem)
    else:
       pool.ring.put(mem)

No locking necessary, etc.

In any case, I don't know what problems you're referring to. Is the program crashing? If so, I'd try running under Valgrind and/or Address Sanitizer first.

jojo0severo commented 3 years ago

Rreally worth optimizing? Actually yes, i'm basing this on an article from Nexedi that wrapp the coroutines with Python and the biggest problem was the coroutine creation time, link

Thread-local ring buffer of pointers I had lots of trouble to optimize the locks so the threads would run as free as possible and on any of my researches found the ring buffer, even considered using CAS, but whatever, I will give it a try, thanks.

Is the program crashing? Sorry for not attaching the valgrind report, i actually did ran with it to check possible memory leaks and other issues from the memory pool.

My thoughts about the Valgrind report:

  1. Valgrind looks confused about the context swapping
  2. The program ends with Segmentation fault cause of invalid memory reading

The full report:

==1350== Memcheck, a memory error detector
==1350== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1350== Using Valgrind-3.14.0-353a3587bb-20181007X and LibVEX; rerun with -h for copyright info
==1350== Command: ./a
==1350== 
--1350-- Valgrind options:
--1350--    --leak-check=full
--1350--    --track-origins=yes
--1350--    -v
--1350-- Contents of /proc/version:
--1350--   Linux version 4.19.0-13-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.160-2 (2020-11-28)
--1350-- 
--1350-- Arch and hwcaps: AMD64, LittleEndian, amd64-rdtscp-sse3
--1350-- Page sizes: currently 4096, max supported 4096
--1350-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind
--1350-- Reading syms from /media/sf_Server/lwan_cython/lwan/a
--1350--    object doesn't have a dynamic symbol table
--1350-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux
--1350--   Considering /usr/lib/debug/.build-id/32/2e77af97f403c3d34ff09edf60b089e72ec889.debug ..
--1350--   .. build-id is valid
--1350--    object doesn't have a dynamic symbol table
--1350-- Scheduler: using generic scheduler lock implementation.
--1350-- Reading suppressions file: /usr/lib/x86_64-linux-gnu/valgrind/default.supp
==1350== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-1350-by-user-on-???
==1350== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-1350-by-user-on-???
==1350== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-1350-by-user-on-???
==1350== 
==1350== TO CONTROL THIS PROCESS USING vgdb (which you probably
==1350== don't want to do, unless you know exactly what you're doing,
==1350== or are doing some strange experiment):
==1350==   /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1350 ...command...
==1350== 
==1350== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==1350==   /path/to/gdb ./a
==1350== and then give GDB the following command
==1350==   target remote | /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1350
==1350== --pid is optional if only one valgrind process is running
==1350== 
==1350== Conditional jump or move depends on uninitialised value(s)
==1350==    at 0x418314: malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x47028A: _dl_get_origin (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x442B2E: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350==  Uninitialised value was created
==1350==    at 0x466B67: brk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x440BC8: sbrk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40596D: __libc_setup_tls (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4052E2: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350== 
==1350== Conditional jump or move depends on uninitialised value(s)
==1350==    at 0x4183B9: malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x47028A: _dl_get_origin (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x442B2E: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350==  Uninitialised value was created
==1350==    at 0x466B67: brk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x440BC8: sbrk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40596D: __libc_setup_tls (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4052E2: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350== 
==1350== Conditional jump or move depends on uninitialised value(s)
==1350==    at 0x4166CC: _int_malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x417564: tcache_init.part.6 (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4183C3: malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x47028A: _dl_get_origin (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x442B2E: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350==  Uninitialised value was created
==1350==    at 0x466B67: brk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x440BC8: sbrk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40596D: __libc_setup_tls (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4052E2: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350== 
==1350== Conditional jump or move depends on uninitialised value(s)
==1350==    at 0x4150F4: _int_free (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x469369: fillin_rpath (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x46995B: _dl_init_paths (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x443006: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350==  Uninitialised value was created
==1350==    at 0x466B67: brk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x440BC8: sbrk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x419D08: __default_morecore (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4160F2: sysmalloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4171D0: _int_malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x417564: tcache_init.part.6 (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4183C3: malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x47028A: _dl_get_origin (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x442B2E: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350== 
coro_new coro1 and coro2
==1350== Warning: client switching stacks?  SP change: 0x1ffefffee8 --> 0x483c118
==1350==          to suppress, use: --max-stackframe=137346432464 or greater
called coro_func_foo
yield in coro_func_foo
==1350== Warning: client switching stacks?  SP change: 0x483c0c8 --> 0x1ffefffef0
==1350==          to suppress, use: --max-stackframe=137346432552 or greater
==1350== Warning: client switching stacks?  SP change: 0x1ffefffee8 --> 0x4836108
==1350==          to suppress, use: --max-stackframe=137346457056 or greater
==1350==          further instances of this message will not be shown.
called coro_func_bar
yield in coro_func_bar
==1350== Use of uninitialised value of size 8
==1350==    at 0x4024B0: ??? (lwan-coro.c:30)
==1350==    by 0x401CCE: main (valg_test.c:37)
==1350==  Uninitialised value was created by a stack allocation
==1350==    at 0x4024B0: ??? (lwan-coro.c:30)
==1350== 
==1350== 
==1350== Process terminating with default action of signal 11 (SIGSEGV)
==1350==  Bad permissions for mapped region at address 0x402A32
==1350==    at 0x40251F: ??? (lwan-coro.c:30)
==1350== 
==1350== HEAP SUMMARY:
==1350==     in use at exit: 0 bytes in 0 blocks
==1350==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1350== 
==1350== All heap blocks were freed -- no leaks are possible
==1350== 
==1350== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)
==1350== 
==1350== 1 errors in context 1 of 5:
==1350== Use of uninitialised value of size 8
==1350==    at 0x4024B0: ??? (lwan-coro.c:30)
==1350==    by 0x401CCE: main (valg_test.c:37)
==1350==  Uninitialised value was created by a stack allocation
==1350==    at 0x4024B0: ??? (lwan-coro.c:30)
==1350== 
==1350== 
==1350== 1 errors in context 2 of 5:
==1350== Conditional jump or move depends on uninitialised value(s)
==1350==    at 0x4150F4: _int_free (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x469369: fillin_rpath (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x46995B: _dl_init_paths (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x443006: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350==  Uninitialised value was created
==1350==    at 0x466B67: brk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x440BC8: sbrk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x419D08: __default_morecore (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4160F2: sysmalloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4171D0: _int_malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x417564: tcache_init.part.6 (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4183C3: malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x47028A: _dl_get_origin (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x442B2E: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350== 
==1350== 
==1350== 1 errors in context 3 of 5:
==1350== Conditional jump or move depends on uninitialised value(s)
==1350==    at 0x4166CC: _int_malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x417564: tcache_init.part.6 (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4183C3: malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x47028A: _dl_get_origin (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x442B2E: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350==  Uninitialised value was created
==1350==    at 0x466B67: brk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x440BC8: sbrk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40596D: __libc_setup_tls (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4052E2: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350== 
==1350== 
==1350== 1 errors in context 4 of 5:
==1350== Conditional jump or move depends on uninitialised value(s)
==1350==    at 0x4183B9: malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x47028A: _dl_get_origin (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x442B2E: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350==  Uninitialised value was created
==1350==    at 0x466B67: brk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x440BC8: sbrk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40596D: __libc_setup_tls (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4052E2: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350== 
==1350== 
==1350== 1 errors in context 5 of 5:
==1350== Conditional jump or move depends on uninitialised value(s)
==1350==    at 0x418314: malloc (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x47028A: _dl_get_origin (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x442B2E: _dl_non_dynamic_init (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4441E0: __libc_init_first (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40534D: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350==  Uninitialised value was created
==1350==    at 0x466B67: brk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x440BC8: sbrk (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x40596D: __libc_setup_tls (in /media/sf_Server/lwan_cython/lwan/a)
==1350==    by 0x4052E2: (below main) (in /media/sf_Server/lwan_cython/lwan/a)
==1350== 
==1350== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)
Segmentation fault
lpereira commented 3 years ago

Yes, Valgrind will get confused about the stack pointer being swapped in and out. You need to build with Valgrind support (need to find the headers and whatnot during build if using the Lwan build system, and build a debug version). This will add in some hints to Valgrind telling that it's OK for the stack pointer to be within a coroutine's stack.

Debugging issues with coroutines isn't easy and problems like this are very likely to happen... sometimes you can use big guns such as rr, sometimes you need to step instruction by instruction, etc. It's tricky because it's really a hack, as you're pulling the carpet from underneath the running program and hoping everything keeps working after you put it under another parts of your program.

As far as using CAS and other lock-free constructs, it's usually better to start by avoiding them altogether and using thread-local storage -- so you can have a pool per thread, if threading is something that your usage model expects to exist.

lpereira commented 3 years ago

BTW, in your project, all source code copied from Lwan do not carry my copyright and licensing information. Please put back the comment with the license information in each file; they're there for a reason.

jojo0severo commented 3 years ago

Yes, Valgrind will get confused about the stack pointer being swapped in and out. You need to build with Valgrind support (need to find the headers and whatnot during build if using the Lwan build system, and build a debug version). This will add in some hints to Valgrind telling that it's OK for the stack pointer to be within a coroutine's stack.

Debugging issues with coroutines isn't easy and problems like this are very likely to happen... sometimes you can use big guns such as rr, sometimes you need to step instruction by instruction, etc. It's tricky because it's really a hack, as you're pulling the carpet from underneath the running program and hoping everything keeps working after you put it under another parts of your program.

As far as using CAS and other lock-free constructs, it's usually better to start by avoiding them altogether and using thread-local storage -- so you can have a pool per thread, if threading is something that your usage model expects to exist.

I'll switch to a thread-local storage and start debugging, thanks (put the copyright on the files, completely forgot about that)

jojo0severo commented 3 years ago

The error was really silly, i was allocating the size of a pointer to coro_t instead of size of coro_t.... Thank you for your help