microsoft / snmalloc

Message passing based allocator
MIT License
1.57k stars 108 forks source link

How to use the new hardening features #523

Closed Zeblote closed 3 weeks ago

Zeblote commented 2 years ago

I am not seeing how to use the new hardening features when including snmalloc as a header-only library on Windows. There doesn't seem to be any #define or similar mentioned in the documentation? Is it just enabled by default?

mjp41 commented 2 years ago

It is enabled with

SNMALLOC_CHECK_CLIENT

I will update the documentation. That does not enable the memcpy protection.

On Windows the performance of the memcpy protection is not great due to the way virtual memory works. We can't map a large accessible zero range without actually committing the memory. We could add a Vectored Exception handler, but then it is going to harm the debugging experience.

Zeblote commented 2 years ago

Thanks! I made a quick test comparing this new version against FMallocBinned2, though whether this can be considered a proper benchmark is questionable. Same integration code as my previous issue https://github.com/microsoft/snmalloc/issues/478.

FMallocBinned2, memory usage: Start game perfmon_2022-05-10_17-02-33

Open map perfmon_2022-05-10_17-02-51

Load large save perfmon_2022-05-10_17-03-42

Clear large save perfmon_2022-05-10_17-04-27

Load large save again perfmon_2022-05-10_17-05-24

snmalloc 0.6 with SNMALLOC_CHECK_CLIENT, memory usage: Start game perfmon_2022-05-10_17-15-10

Open map perfmon_2022-05-10_17-15-24

Load large save perfmon_2022-05-10_17-16-08

Clear large save perfmon_2022-05-10_17-17-22

Load large save again perfmon_2022-05-10_17-18-11

It seems like it doesn't free unused memory nearly as much as the default unreal allocator. Overall usage is also a bit higher. This seems to be a clear win for FMallocBinned2.

FMallocBinned2, performance: Brickadia-Win64-Shipping_2022-05-10_17-05-45

snmalloc 0.6 with SNMALLOC_CHECK_CLIENT, performance: Brickadia-Win64-Shipping_2022-05-10_17-18-24

Only slightly slower on allocations, but much faster on deallocations. This seems to be a win for snmalloc overall.

The game also seems to crash on exit when snmalloc is used but I haven't yet investigated why.

mjp41 commented 2 years ago

Thanks for running this benchmark. The CHECK_CLIENT version will use more memory. It needs to keep enough spare space for randomisation to have enough entropy. There are a few cases where it decides to allocate more memory just to increase randomness in allocation patterns.

The game also seems to crash on exit when snmalloc is used but I haven't yet investigated why.

So the CHECK_CLIENT does a bunch of checking very lazily. We check consistency at the end to force any lazy checks that have not occurred. My guess is that it has found corruption during teardown. Would be super interested if that is the case.

Zeblote commented 2 years ago

On Windows the performance of the memcpy protection is not great due to the way virtual memory works. We can't map a large accessible zero range without actually committing the memory. We could add a Vectored Exception handler, but then it is going to harm the debugging experience.

Can you explain why this is required? Is it to support allocations from outside snmalloc?

For example, in Unreal Engine, there are over 1000 calls to FMemory::Memcpy in runtime modules, the absolute vast majority only ever operating on memory obtained from FMemory. It would probably be a good start to guard all of those.

Debugging experience is not really important for distribution builds. Though if it interfered with Unreal's crash reporting, that would be a problem.

mjp41 commented 2 years ago

Yeah, it is to support allocations from outside snmalloc. The key issue is "absolute vast majority", if one isn't and we crash because of it, then customers will get very annoyed. For instance,

   char buffer[256];
   memcpy(src, buffer, std::min(256, src_size));

This needs the stack allocation of buffer to not create an access violation when we look up the meta-data. On Linux for instance, we mmap a 256GiB region of memory, and then any access is allowed. On Windows, we MEM_RESERVE, a 256GiB region, and then MEM_COMMIT the bits that correspond to snmalloc allocations. This means that accessing the meta-data on Windows for anything that doesn't correspond to an allocation will segfault. There are three approaches we could take

I prototyped the third option a couple of years ago, and I think given the new refactoring in 0.6.0 it would fit quite nicely. I don't currently have time to implement it though.

Zeblote commented 2 years ago

Oh, you're right. I completely forgot about stack allocations. Those are very common, so it would be unusable.

Hope you'll find time to implement a good performing solution at some point!