Open Zeblote opened 2 years ago
Hi @Zeblote; thanks for your report -- I would like to see if we can improve this; It would be great if you can supply me with a test that I can run myself but otherwise we'll try here :-).
first of all, not all options in the source always work together (this is mainly due to maintaining the two versions and binary compatibility). In particular, you cannot use reset_decommits
with v2.x
and segment_reset
is a no-op in v2.x. (Sorry for this, I should have anticipated this and I will add comments in the source code or perhaps fully remove them to avoid confusion.)
If you can use the latest dev-slice
that would be great (I think the behavior will be the same though)
The page_reset
on Windows will do MEM_RESET which will make the memory available to other processes but it will not show up as decommitted. (Note that on windows a higher commit charge does not mean it uses that memory -- it is just backed by the OS and most of it may be virtual (due to MEM_RESET for example). The working set should be as low as possible though)
It looks that perhaps during a save and then reload there are many terminated threads that "abandon" their heaps. This memory is usually slowly recycled on demand and that is I think (maybe?) why you see the high commit charge after a clear. (and usually this is good -- we do not want to decommit memory that we are going to use again soon after). Enabling abandoned_page_reset
helps here so keep that on.
Can you also try:
MIMALLOC_SEGMENT_DECOMMIT_DELAY=0
; this will aggressively decommit for abandoned heaps.mi_collect(true)
on threads that are about to be terminated (or right after a clear). I understand this may be hard depending on the source code. Let me know.Best, Daan
Ps. using secure mode is more expensive; it is amazing if you see similar runtime's in that case compared to the other allocator. Have you tried if you see the same commit charge in the plain version (without MI_SECURE) ? I think it should be the same but it may be good to verify this.
Thanks for the tips, I will try them all in just a bit! I didn't know about dev-slice branch, do you recommend only using it for testing or also for production use?
You mention calling mi_collect() on specific threads, does that mean it doesn't have global effects? I had already hooked it up to the TrimMemory hook Unreal Engine calls roughly once a minute on the main thread, after its garbage collector has dumped unused objects and assets. Maybe that's not how it's intended to be used though.
Note that on windows a higher commit charge does not mean it uses that memory
Fun story there - it can! Particularly, if your binaries do not yet have great reputation with Windows Defender (i.e. are not digitally signed). I was recently investigating a highly mysterious issue where all of the committed but unused memory was suddenly getting moved to the working set for literally no reason. So I'd load the save, it'd use 6GB ram, I'd leave it idle for a while, and suddenly, out of nowhere, it will jump up to using 11GB ram. It doesn't happen in the previous release build that's been launched by many thousands of users, but it happens to every new build I create.
After many, many hours of puzzling I was finally able to track it down using Windows Performance Recorder/Analyzer to be caused by MsMpEng.exe using ReadProcessMemory to scan all committed memory of my process, causing it to get backed by physical pages so it can read it. Still stumped on this one, and I really hope it won't happen for users once the build is deployed. Two users I gave the build to for testing were also able to reproduce it, but that's probably not enough to affect Windows Defender reputation. Since you work on this memory related stuff, is there a chance you could have the right connections to get this looked at..?
I can provide screenshots, .etl traces capturing the problem, etc whatever is necessary to demonstrate that Windows Defender is misbehaving.
Currently, I have added the dev workspace to Windows Defender exclusions to be able to do sensible tests with these allocators without it interfering like this. It is 100% reproducible on my machine, the dev build works fine until Windows Defender touches it and doubles the memory usage for no reason.
Thanks. Yikes -- Windows Defender touching any committed memory would not be good for performance for large services and applications. I have no connections to that team but will ask around a bit. (but best if you are able to file a report through other channels). It may be by design though..
The dev
branch is the latest 1.7.x version, and dev-slice
is the latest 2.0.x version -- these are not for production use since those are the development branches. They are usually quite stable though -- I actually hope to bump to v2.0.4 this week. The 2.x versions have another internal way of allocating mimalloc "pages" and was made to be better at decommitting and reusing memory for long running services or large memory footprints. (That is why I really hope to address your issues as that is unusual. )
I checked again, and my earlier suggestion:
If this does not help, if possible, call mi_collect(true) on threads that are about to be terminated (or right after a clear). I understand this may be hard depending on the source code. Let me know.
will not help; the collect is already done automatically on thread termination. Hope you didn't already spend time on this..
mi_collect
is indeed not global but per thread. This is actually ok I think, but we may want to extend it to also decommit in the abandoned heaps from terminated threads as it is my hunch that this is what we are seeing in your situation. Not sure. Let's first see what happens when segment_decommit_delay
is set to 0.
@daanx Thanks for the response! I did not get to experiment more with it yet, but will soon.
I don't think there are many terminated threads, by the way. Unreal Engine creates like 50 threads at the start and then runs a task graph on them, so they are always reused until the game is closed.
(but best if you are able to file a report through other channels)
Unfortunately, not being part of a company with some enterprise support contract, we don't have any such channel beyond the standard Microsoft support, who wasn't able to understand the problem and giving me unrelated copy-paste answers.
It may be by design though..
I can't imagine that. I don't think I've ever seen it happen in the past, and surely Windows Defender could have some way to determine whether memory regions are actually allocated physical pages, or... not do that? It is kind of a disaster. Especially since there is nothing we can do about this virtual memory usage, it is done by the d3d11 library as part of allocating GPU buffers and then abandoned. And once this has marked it as used, it just sits around in the working set, since the program is not aware of this happening and can't tell windows that it doesn't actually need this memory...
If it really is by design, does some way exist to have release builds start out with positive reputation? Would signing the binaries with an EV cert help? Since I noticed that the issue does not happen to any other signed software or to our old build that's been around for a while.
but will ask around a bit
That would be awesome! Here are some images from one of the traces I captured that demonstrate the problem.
General overview showing that at some point, the working set magically rises to cover almost all of the commit while the game is idle:
Zooming in on the problematic section and filtering down a bit, demonstrating Windows Defender causing millions of page accesses during that exact time:
That count multiplied by 4KB pages equals roughly the amount of memory that was unnecessarily added to the working set.
Of course, if this really is intentional behavior, VirtualAlloc documentation must be updated.
But I really, really, doubt it is. It seems more like a bug in Windows Defender?
By the way, the pages mimalloc resets with this process
The page_reset on Windows will do MEM_RESET which will make the memory available to other processes but it will not show up as decommitted. (Note that on windows a higher commit charge does not mean it uses that memory -- it is just backed by the OS and most of it may be virtual (due to MEM_RESET for example). The working set should be as low as possible though)
are also brought back to life by the Windows Defender problem.
Hi @Zeblote; thanks for the detailed report -- if I can find the relevant people I will forward this information.
@daanx I finally got to try the dev-slice branch and that new option. Here are the results:
FMallocBinned2
(start, load, clear, load)
mimalloc MI_SECURE=4, page_reset=1, abandoned_page_decommit=1, segment_decommit_delay=0
(start, load, clear, load)
It doesn't seem like much changed?
I think these are acceptable results though. Not particularly concerned with the higher working set after clearing the save since it seems to be correctly reused on loading the next one.
The deallocations during clearing the save with this config seem to be significantly slower than FMallocBinned2 or mimalloc with page_reset disabled. But disabling that makes it never free any memory at all, which is unusable.
Perhaps interesting, also setting decommit_delay=0 makes clearing the save even slower, but it reduces the commit overall and brings us closer to the results of FMallocBinned2 in that column. Are delayed decommits working correctly? Does mimalloc need to be "ticked" in some way?
mimalloc MI_SECURE=4, page_reset=1, abandoned_page_decommit=1, decommit_delay=0, segment_decommit_delay=0
(start, load, clear, load)
I think FMallocBinned2 might be a good benchmark to compare against in general, at least in Unreal Engine packaged builds it seems to beat every other allocator implementation in both allocation/deallocation performance and total memory usage at the same time. But of course it has no security features at all, hence this experiment.
Thanks for your detailed reports @Zeblote! Great. I hope we can make it work better out of the box without needing these aggressive settings.
I made changes to mi_collect(true)
and I think it might help calling that when clearing the save (from just the main thread). Wait a bit until I have done more experiments .. but then it would be great if you can test it again :-)
I pushed a new release today that has more aggressive settings for mi_collect
--would be great if you could try again with v2.0.5
and see if things improved. Thanks!
Hmm. Here are a few tests with 2.0.5, mi_collect(true)
has been called on the main thread at each step before capturing the memory usage. @daanx
FMallocBinned2 (goal, every value is lower than the lowest achieved one with mimalloc)
MI_SECURE=4
MI_SECURE=4 segment_decommit_delay=0
(seems about same as previous?)
MI_SECURE=4 page_reset=1
(slower to clear save, much lower working set, no effect on commit)
MI_SECURE=4 page_reset=1 segment_decommit_delay=0
(seems about same as previous?)
MI_SECURE=4 decommit_delay=0
(much slower to clear save, much lower working set and commit)
MI_SECURE=4 page_reset=1 decommit_delay=0
(very much slower to clear save, seems kinda same as previous?)
I'm still confused why decommit_delay=0
has such a massive effect. Shouldn't its only change be that things are cleared faster? Or maybe it's badly named for what it does?
Hi! I'm evaluating using mimalloc-secure 2.0.3 for our game instead of Unreal Engine's default FMallocBinned2. The runtime performance seems good, I have not measured any significant regression from this switch. However, it seems to not decommit memory nearly as much if the memory usage of the game goes down.
My configuration: MI_SECURE = 4
The target for comparison, using FMallocBinned2:
The initial results using mimalloc:
That's... not great in comparison.
I enabled the options page_reset, abandoned_page_reset, gives significant improvements for the working set, but no improvement on unnecessary memory being decommitted again.
The option segment_reset seems to not be hooked up to anything. The option reset_decommits causes the game to crash instantly during static initialization, if page_reset is also enabled (otherwise it does nothing):
![image](https://user-images.githubusercontent.com/6764788/151113692-bbc841b8-82ff-4996-afd9-bf8936cdd327.png)