Open romange opened 1 year ago
Thanks @romange for the detailed feedback (and apologies for the late reply).
12.5% increase of committed over used is excellent (!) :-) For a long running process there will always be fragmentation and thus more committed than actually used.
It should be the case that for each thread we never allocate a fresh (mimalloc) page when there is still a page that has blocks available of the right size (see page.c:mi_page_queue_find_free_ex
). However, we slowly extend the "top" of the page up (page->capacity
vs page->reserved
items) so we touch the memory in a page in about 4KiB increments -- so a partially used page will only increase the RSS by the touched part.
We do commit the memory though per mimalloc page at once so that may affect the measurements. We could also commit inside a page on-demand. I tried this once and it helps but is also relatively expensive ...
So, for your measurements it is best to distinguish between the unused but untouched parts (the difference between the capacity and total reserved space), and the unused but touched part (the blocks in the free lists).
I suspect the graph with the "unused but untouched" should have much less unused memory? (let me know :-) )
Hi @daanx, thanks for following up on my questions. Meanwhile, I've been studying the domain myself.
Unfortunately, when time passes by, the difference in the process between committed and used memory goes well beyond 40-50%, so I need to solve the issue. Below is another graph that shows the distribution of (comitted-used)
sizes for different block sizes.
As you said yourself in #645 , any improvements on the allocator side will be incremental (aka reduce probability of, reduce the acuteness etc), because there is nothing we can do with an under-utilized and non-empty page, since the allocator can not move its contents somewhere else. Therefore, it seems that the only solution (if someone really needs to solve the external fragmentation issue) would be - app-aware defragmentation. And this is what I described in https://github.com/dragonflydb/dragonfly/issues/448
Redis, btw, implemented something similar with jemalloc. They call it "active-defrag". They introduced a function that tells how much a page is utilized: https://github.com/redis/redis/commit/e8099cabd19c4e3a46c94c39e69e13191d43f5eb and use it to move memory around.
We are taking a similar approach with mimalloc. This is the function that I believe will help us to determine whether we need to realloc the ptr that belongs to the underutilized page: https://github.com/romange/helio/pull/27/files#diff-a9d9d3912eb3a1eeced1a0726da5a12f80f764b999f683c40d96a45c8c31884b
What do you think about it?
Ah I see -- I agree that an allocator can not generally "solve" the fragmentation problem as we cannot move objects. I understand now that you would like to give the application itself the opportunity to reallocate objects that are in fragmented page.
mi_heap
and allocate most temporary objects in that heap so they won't fragment the long lived ones. xheap
pointer since if it belongs to another thread, that thread may be terminating and deallocating its heap structure. You don't need to get the page queue right? Just check if page->prev != NULL
is enough to know it is not the first page in a queue.mi_compact
call or something).Very interesting to work on this
@daanx We ended up patching mimalloc with the following patch: https://github.com/romange/helio/blob/master/patches/mimalloc-v2.0.5.patch
seems that it's doing its job. We reduce the committed/used
ratio from 1.25 to 1.08.
Best is to avoid the fragmentation you see in the first place: do you know the root cause of the application allocating many objects of some size but only keeping few live after a while? If this is somewhat isolated, you could consider using an extra mi_heap and allocate most temporary objects in that heap so they won't fragment the long lived ones.
@daanx I am hitting the same issue as the original poster, and I tried to use an extra mi_heap
like you noted. However, the next issue I hit was https://github.com/microsoft/mimalloc/issues/720, where I can't actually use it because I need it in a multi-threaded environment.
Hello Daan,
The project I am working on uses mimalloc underneath.
A user reported the following problem: Dragonfly reports 80GiB used memory (tracked by aggregating
mi_usable_size
) but mi_malloc stats show 90GiB committed. He also says that the backend adds up roughly 0.5% committed memory each day whileused
does not change. Assuming that my "used" accounting is correct, it means the backend does not have leakage (otherwise "used" would grow as well). Dragonfly has 16 threads.mimalloc stats output:
mi_heap_visit_blocks(.., false, ...)
) from a single heap - from a single thread out of 16. That heap has5.9 GiB
of committed memory and 5GiB used. When looking at the unused memory I see lots of areas that have unused blocks. See the chart below. Why does it happen? Specifically, can mi_malloc create a new area of block size 80 if there is another one with unused memory? "Unused" in this context iscommitted - used
.I am also attaching the detailed table of areas from which the chart was created. The first column in the table is the number of occurrences of triple
<reserved, committed, used>
. sorted-malloc.txt