mjansson / rpmalloc

Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C
Other
2.13k stars 186 forks source link

Too high memory consumption #334

Open Drwalin opened 4 months ago

Drwalin commented 4 months ago

Hi,

I have written a program, that works in a producer-consumer allocation scheme (there are other threads as well, but most of the work and allocation is done in main producer-consumer pair of threads). Initially I used glibc malloc, but it was very slow, then I've found rpmalloc which in my case sped up performance of my program about 5 times. There is one problem: there is a constant increase in memory consumption. This shocked me, because my program works in iterations and after each iteration most of allocated memory is being freed.

Over time of running program, resident memory and virtual memory both increased a lot each minute. After some time, the program crashed having over 1.5 TiB of virtual memory allocated, 15 GiB in ram and 40 GiB in swap. The data stored in swap was never loaded back into ram (disk usage showed only writes). Total number of calls to pair of rpmalloc and rpfree was around 140 bilion, with most of allocations of size 256, 512 and 4096.

In contrast, glibc malloc version used in peak up to 9 GiB of RAM and custom object pool used in peak up to 6 GiB of RAM having done the same amount of work as rpmalloc version.

My system: OS: Arch Linux x86_64 Kernel: 6.8.9-arch1-2 RAM: 32GiB SWAP: 44 GiB

rpmalloc version: branch: develop commit: 955f44b

Behavior of branch mjansson/rewrite 2dd697f seems to exhibit the same behavior.

I am not sure if this is a problem with my application or rpmalloc, but other allocators do not indicate any faulty behavior. Valgrind and GCC's AddressSanitizer do not show any memory leaks, buffer overflows, nor hidden segmentation faults in any of the following versions: rpmalloc, malloc, or custom object pool.

Edit: I've found out that malloc overly high resident memory usage is due to linux settings and I can call malloc_trim(0) to decommit/free resident memory (as far as I've found out, other systems do not have nor require malloc_trim).

I should also note that I compile rpmalloc with ENABLE_OVERRIDE=0, because I had strange errors while debugging otherwise.

mjansson commented 4 months ago

Interesting - so if I want to try and reproduce this, I can basically create a program that runs a number of threads in pairs where each pair has one thread allocating memory and the other then deallocating it? What are the memory sizes being allocated?

Drwalin commented 4 months ago

That's the core of payload of may application, but small other allocations (from other threads, or other sizes may increase this behavior as far as I know). Allocations that are performed a lot (99.9% of number of allocations): 256, 512 and 4096, there may be few in range of 4 KiB-60 KiB, but these are sparse.

P.S.: I am now testing my application with main branch and memory consumption does not seem to increase over expected amounts.

Drwalin commented 4 months ago

Now I think I have tested enough with main branch. After doing more work than with develop branch, virtual and resident memory consumption is well bellow acceptable/predicted maximum range.

Is it normal for rpmalloc to consume (reserve/cache) a lot more memory than needed? Application uses 50 MiB in idle mode with custom pool allocator, but version with rpmalloc/main does not drop below 500 MiB. Is this 450 MiB of additionaly reserved resident memory just a thread local or global cache for future fast allocations? Is it normal/acceptable behavior for rpmalloc for application with 3 threads? It seems to be a good enough trade-off between performance and memory consumption, I am just wondering of a norm.

'mainbranch is faster for me thandevelopbranch by 10%-15%, while ignoring memory leak withdevelop` branch.