Closed Zeblote closed 3 weeks ago
It is enabled with
SNMALLOC_CHECK_CLIENT
I will update the documentation. That does not enable the memcpy protection.
On Windows the performance of the memcpy protection is not great due to the way virtual memory works. We can't map a large accessible zero range without actually committing the memory. We could add a Vectored Exception handler, but then it is going to harm the debugging experience.
Thanks! I made a quick test comparing this new version against FMallocBinned2, though whether this can be considered a proper benchmark is questionable. Same integration code as my previous issue https://github.com/microsoft/snmalloc/issues/478.
FMallocBinned2, memory usage: Start game
Open map
Load large save
Clear large save
Load large save again
snmalloc 0.6 with SNMALLOC_CHECK_CLIENT, memory usage: Start game
Open map
Load large save
Clear large save
Load large save again
It seems like it doesn't free unused memory nearly as much as the default unreal allocator. Overall usage is also a bit higher. This seems to be a clear win for FMallocBinned2.
FMallocBinned2, performance:
snmalloc 0.6 with SNMALLOC_CHECK_CLIENT, performance:
Only slightly slower on allocations, but much faster on deallocations. This seems to be a win for snmalloc overall.
The game also seems to crash on exit when snmalloc is used but I haven't yet investigated why.
Thanks for running this benchmark. The CHECK_CLIENT version will use more memory. It needs to keep enough spare space for randomisation to have enough entropy. There are a few cases where it decides to allocate more memory just to increase randomness in allocation patterns.
The game also seems to crash on exit when snmalloc is used but I haven't yet investigated why.
So the CHECK_CLIENT does a bunch of checking very lazily. We check consistency at the end to force any lazy checks that have not occurred. My guess is that it has found corruption during teardown. Would be super interested if that is the case.
On Windows the performance of the memcpy protection is not great due to the way virtual memory works. We can't map a large accessible zero range without actually committing the memory. We could add a Vectored Exception handler, but then it is going to harm the debugging experience.
Can you explain why this is required? Is it to support allocations from outside snmalloc?
For example, in Unreal Engine, there are over 1000 calls to FMemory::Memcpy in runtime modules, the absolute vast majority only ever operating on memory obtained from FMemory. It would probably be a good start to guard all of those.
Debugging experience is not really important for distribution builds. Though if it interfered with Unreal's crash reporting, that would be a problem.
Yeah, it is to support allocations from outside snmalloc. The key issue is "absolute vast majority", if one isn't and we crash because of it, then customers will get very annoyed. For instance,
char buffer[256];
memcpy(src, buffer, std::min(256, src_size));
This needs the stack allocation of buffer
to not create an access violation when we look up the meta-data. On Linux for instance, we mmap
a 256GiB region of memory, and then any access is allowed. On Windows, we MEM_RESERVE
, a 256GiB region, and then MEM_COMMIT
the bits that correspond to snmalloc allocations. This means that accessing the meta-data on Windows for anything that doesn't correspond to an allocation will segfault. There are three approaches we could take
MEM_COMMIT
. This is the current implementation and is very slow.I prototyped the third option a couple of years ago, and I think given the new refactoring in 0.6.0 it would fit quite nicely. I don't currently have time to implement it though.
Oh, you're right. I completely forgot about stack allocations. Those are very common, so it would be unusable.
Hope you'll find time to implement a good performing solution at some point!
I am not seeing how to use the new hardening features when including snmalloc as a header-only library on Windows. There doesn't seem to be any #define or similar mentioned in the documentation? Is it just enabled by default?