microsoft / mimalloc

mimalloc is a compact general purpose allocator with excellent performance.
MIT License
9.74k stars 793 forks source link

crash on Intel CPUs with FSRM #807

Open KonradMagnusson opened 10 months ago

KonradMagnusson commented 10 months ago

Hello!

We (Paradox Interactive, Victoria 3 team) have been receiving player reports of a crash that we've now narrowed down the source of to memory allocations using mimalloc 2.1.1 on 10th gen and newer Intel CPUs.

I am only sporadically able to reproduce the crash, and only in optimized release builds so I can sadly not provide much debug information. I'm using an Intel i9-13900k.

The top of the stack looks like this:

 mi_heap_malloc alloc.c:38
 mi_heap_alloc_new alloc.c:950
 mi_new alloc.c:956
 PdxNew(unsigned long) pdx_allocator.cpp:223
 operator new(unsigned long) pdx_allocator.cpp:288

The line numbers are likely misleading since things are inlined, and I'm not getting any values for the page pointer passed to mi_heap_malloc(...). However heap is null, and I'm not sure what's causing this.

Since our user reports and in-house testing hints at this only happening on newer Intel CPUs, I did some digging and found this CPU feature check (mi_detect_cpu_features) that sets a bool that is checked here (_mi_memcpy and _mi_memzero), resulting in hardware-dependent implementations of _mi_memcpy and _mi_memzero. If I disable the FSRM-based code (see diff below), we are no longer able to reproduce the crash.

The crash is not OS-dependent, and our QA has confirmed it to happen on an i7-10700k too.
Victoria 3 uses mimalloc in version 1.2 and onwards. We will be applying the patch below, disabling the FSRM implementations, as a workaround for the crash in 1.4.


fsrm-diff.patch
diff --git a/common/mimalloc/mimalloc-2.1.1/include/mimalloc/internal.h b/common/mimalloc/mimalloc-2.1.1/include/mimalloc/internal.h
index a4495c1613..6a96ded17a 100644
--- a/common/mimalloc/mimalloc-2.1.1/include/mimalloc/internal.h
+++ b/common/mimalloc/mimalloc-2.1.1/include/mimalloc/internal.h
@@ -885,27 +885,6 @@ static inline size_t mi_bsr(uintptr_t x) {
 // (AMD Zen3+ (~2020) or Intel Ice Lake+ (~2017). See also issue #201 and pr #253.
 // ---------------------------------------------------------------------------------

-#if !MI_TRACK_ENABLED && defined(_WIN32) && (defined(_M_IX86) || defined(_M_X64))
-#include <intrin.h>
-#include <string.h>
-extern bool _mi_cpu_has_fsrm;
-static inline void _mi_memcpy(void* dst, const void* src, size_t n) {
-  if (_mi_cpu_has_fsrm) {
-    __movsb((unsigned char*)dst, (const unsigned char*)src, n);
-  }
-  else {
-    memcpy(dst, src, n);
-  }
-}
-static inline void _mi_memzero(void* dst, size_t n) {
-  if (_mi_cpu_has_fsrm) {
-    __stosb((unsigned char*)dst, 0, n);
-  }
-  else {
-    memset(dst, 0, n);
-  }
-}
-#else
 #include <string.h>
 static inline void _mi_memcpy(void* dst, const void* src, size_t n) {
   memcpy(dst, src, n);
@@ -913,7 +892,6 @@ static inline void _mi_memcpy(void* dst, const void* src, size_t n) {
 static inline void _mi_memzero(void* dst, size_t n) {
   memset(dst, 0, n);
 }
-#endif

 // -------------------------------------------------------------------------------
diff --git a/common/mimalloc/mimalloc-2.1.1/src/init.c b/common/mimalloc/mimalloc-2.1.1/src/init.c
index 51d42acd9b..ccda7b41c3 100644
--- a/common/mimalloc/mimalloc-2.1.1/src/init.c
+++ b/common/mimalloc/mimalloc-2.1.1/src/init.c
@@ -528,22 +528,6 @@ static void mi_process_load(void) {
   _mi_random_reinit_if_weak(&_mi_heap_main.random);
 }

-#if defined(_WIN32) && (defined(_M_IX86) || defined(_M_X64))
-#include <intrin.h>
-mi_decl_cache_align bool _mi_cpu_has_fsrm = false;
-
-static void mi_detect_cpu_features(void) {
-  // FSRM for fast rep movsb support (AMD Zen3+ (~2020) or Intel Ice Lake+ (~2017))
-  int32_t cpu_info[4];
-  __cpuid(cpu_info, 7);
-  _mi_cpu_has_fsrm = ((cpu_info[3] & (1 << 4)) != 0); // bit 4 of EDX : see <https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features>
-}
-#else
-static void mi_detect_cpu_features(void) {
-  // nothing
-}
-#endif
-
 // Initialize the process; called by thread_init or the process loader
 void mi_process_init(void) mi_attr_noexcept {
   // ensure we are called once
@@ -553,7 +537,6 @@ void mi_process_init(void) mi_attr_noexcept {
   _mi_verbose_message("process init: 0x%zx\n", _mi_thread_id());
   mi_process_setup_auto_thread_done();

-  mi_detect_cpu_features();
   _mi_os_init();
   mi_heap_main_init();
   #if MI_DEBUG
KonradMagnusson commented 7 months ago

I've done some testing, and it seems mimalloc v2.0.0 does not suffer from the same issue in an otherwise identical setup. The same FSRM optimizations are present in v2.0.0.

malkia commented 7 months ago

Possibly related https://news.ycombinator.com/item?id=38266773 -> https://lock.cmpxchg8b.com/reptar.html

malkia commented 7 months ago

@KonradMagnusson - Do you have the binaries (.dll) of the mimalloc, or at least how they were compiled? (Internally we are also using it, and I want to ensure we are alright - we haven't seen the issue though - but probably by the sheer luck that I've compiled mimalloc with TRACKING support, and for some reason this codepath got disabled there.

KonradMagnusson commented 7 months ago

Hi @malkia, sorry about the late reply

We compile mimalloc as an (unmodified) external CMake library, that we then link to statically.

We set the following defines:

-DMI_BUILD_OBJECT=Off
-DMI_BUILD_SHARED=Off
-DMI_BUILD_STATIC=On
-DMI_OVERRIDE=Off
-DMI_BUILD_TESTS=Off
KonradMagnusson commented 7 months ago

Possibly related https://news.ycombinator.com/item?id=38266773 -> https://lock.cmpxchg8b.com/reptar.html

It at the very least does not sound far-fetched to my ears.

I updated my CPU microcode to 20231114, and have not been able to reproduce the crash yet :eyes:

edit: I updated the µcode and locally reverted the removal of FSRM the same day it rolled out - November 14th.