microsoft / mimalloc

mimalloc is a compact general purpose allocator with excellent performance.
MIT License
10.58k stars 865 forks source link

Heaps and C++ static destructors #508

Open res2k opened 2 years ago

res2k commented 2 years ago

Hi, the following scenario:

Now, I'm wrapping some mimalloc functions (including those for heap creation/destruction), and those wrapper functions are exported from the DLL and used by an executable. The wrappers are called from the constructor and destructor of a class, instanced statically in the executable.

Now, when exiting the executable I get a crash when a heap is destroyed by that static instance: the heap object is already destroyed (filled with 0xfd)!

AFAICS this happens because the "FLS cleanup" is happening quite early during exiting, with the destruction of the static instance not yet having happened. As the FLS cleanup destroys all heaps for the thread this also destroys the heap held in the static instance, and at it's destruction time, it's essentially a heap "double destroy".

I briefly tried the same, but with a dynamically linked mimalloc. This works fine, it appears the "thread detach" cleanup happens late enough here.

My workaround is (currently) to not destroy the heap in the destructor of the class that is statically instanced. However, this is probably not a general solution...

I guess one approach to remedy this issue could be to perform the "thread cleanup" for the main thread as late as possible (essentially, during process cleanup). Though I wonder whether that would have other implications I may not see...

Any thoughts on the matter?

daanx commented 2 years ago

Hi Frank, thanks for the report; Yes, dll initialization and destruction is tricky on Windows. The current recommended strategy on Windows is:

but it would be great to support your use case better (statically linking mimalloc in a single DLL). Currently, mimalloc use plain "atexit" to ensure mi_process_done is called which destroys the heaps. This works generally on Windows as the C++ destructors use the same mechanism.. that is, if we get in early enough. As such, I think what you see is not due to FLS cleanup (are you using multiple threads?) but just late initialization.

The relevant code is at the end of init.c. Can you try to compile mimalloc as C++ code? (use -DMI_USE_CXX=ON with cmake, or use the /TP option for cl) -- that will initialize using a C++ static constructor which may make the call to atexit early enough to work?

res2k commented 2 years ago

To practically illustrate the issue I created a simple test project: https://github.com/res2k/mimalloc-cleanup

It's set up along the same lines as the project where I noticed the issue initially: a DLL statically linking mimalloc, the way heaps are used, no multithreading.

When running the "exe" the following crash occurs:

lib.dll!mi_heap_visit_pages(mi_heap_s * heap, bool(*)(mi_heap_s *, mi_page_queue_s *, mi_page_s *, void *, void *) fn, void * arg1, void * arg2) Line 39 (d:\sources\mimalloc-dtor-order\mimalloc\src\heap.c:39)
lib.dll!_mi_heap_destroy_pages(mi_heap_s * heap) Line 310 (d:\sources\mimalloc-dtor-order\mimalloc\src\heap.c:310)
lib.dll!mi_heap_destroy(mi_heap_s * heap) Line 327 (d:\sources\mimalloc-dtor-order\mimalloc\src\heap.c:327)
lib.dll!HeapWrapper::~HeapWrapper() Line 28 (d:\sources\mimalloc-dtor-order\lib.cpp:28)
lib.dll!`dynamic atexit destructor for 'static_heap''() (Unknown Source:0)
ucrtbased.dll!_execute_onexit_table::__l2::<lambda>() Line 206 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\onexit.cpp:206)
ucrtbased.dll!__crt_seh_guarded_call<int>::operator()<void <lambda>(void),int <lambda>(void) &,void <lambda>(void)>(__acrt_lock_and_call::__l2::void <lambda>(void) && setup, _execute_onexit_table::__l2::int <lambda>(void) & action, __acrt_lock_and_call::__l2::void <lambda>(void) && cleanup) Line 204 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\VCCRT\vcruntime\inc\internal_shared.h:204)
ucrtbased.dll!__acrt_lock_and_call<int <lambda>(void)>(const __acrt_lock_id lock_id, _execute_onexit_table::__l2::int <lambda>(void) && action) Line 974 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\inc\corecrt_internal.h:974)
ucrtbased.dll!_execute_onexit_table(_onexit_table_t * table) Line 231 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\onexit.cpp:231)
lib.dll!__scrt_dllmain_uninitialize_c() Line 399 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\utility\utility.cpp:399)
lib.dll!dllmain_crt_process_detach(const bool is_terminating) Line 182 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:182)
lib.dll!dllmain_crt_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 220 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:220)
lib.dll!dllmain_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 293 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293)
lib.dll!_DllMainCRTStartup(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 335 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:335)
ntdll.dll!LdrpCallInitRoutine() (Unknown Source:0)
ntdll.dll!LdrShutdownProcess() (Unknown Source:0)
ntdll.dll!RtlExitUserProcess() (Unknown Source:0)
kernel32.dll!ExitProcessImplementation() (Unknown Source:0)
ucrtbased.dll!exit_or_terminate_process(const unsigned int return_code) Line 144 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:144)
ucrtbased.dll!common_exit(const int return_code, const _crt_exit_cleanup_mode cleanup_mode, const _crt_exit_return_mode return_mode) Line 280 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:280)
ucrtbased.dll!exit(int return_code) Line 294 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:294)
exe.exe!__scrt_common_main_seh() Line 297 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:297)
exe.exe!__scrt_common_main() Line 331 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331)
exe.exe!mainCRTStartup(void * __formal) Line 17 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17)
kernel32.dll!BaseThreadInitThunk() (Unknown Source:0)
ntdll.dll!RtlUserThreadStart() (Unknown Source:0)

Inspecting the heap parameter reveals lots of 0xfds - it's already freed.

I also added some printfs, both my code and mimalloc:

HeapWrapper::HeapWrapper: created 'container' 0000020000001500
HeapWrapper::HeapWrapper: created heap 0000020000010000
_mi_heap_done: mi_thread_done for 00007FF9BBBD82C0
mi_heap_delete: about to delete heap 0000020000010000
HeapWrapper::~HeapWrapper: about to destroy heap 0000020000010000
mi_heap_destroy: about to destroy heap 0000020000010000

(The output contained a number of "mimalloc: warning: mi_usable_size: pointer might not point to a valid heap region", I removed those.)

Look closely: the heap 0000020000010000 we created in the static ctor HeapWrapper::HeapWrapper gets "delete"d before dtor HeapWrapper::~HeapWrapper got called!

Next step: break in _mi_heap_done. Callstack:

lib.dll!_mi_heap_done(mi_heap_s * heap) Line 212 (d:\sources\mimalloc-dtor-order\mimalloc\src\init.c:212)
lib.dll!_mi_thread_done(mi_heap_s * heap) Line 362 (d:\sources\mimalloc-dtor-order\mimalloc\src\init.c:362)
lib.dll!mi_fls_done(void * value) Line 293 (d:\sources\mimalloc-dtor-order\mimalloc\src\init.c:293)
ntdll.dll!RtlpFlsDataCleanup() (Unknown Source:0)
ntdll.dll!LdrShutdownProcess() (Unknown Source:0)
ntdll.dll!RtlExitUserProcess() (Unknown Source:0)
kernel32.dll!ExitProcessImplementation() (Unknown Source:0)
ucrtbased.dll!exit_or_terminate_process(const unsigned int return_code) Line 144 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:144)
ucrtbased.dll!common_exit(const int return_code, const _crt_exit_cleanup_mode cleanup_mode, const _crt_exit_return_mode return_mode) Line 280 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:280)
ucrtbased.dll!exit(int return_code) Line 294 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:294)
exe.exe!__scrt_common_main_seh() Line 297 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:297)
exe.exe!__scrt_common_main() Line 331 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331)
exe.exe!mainCRTStartup(void * __formal) Line 17 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17)
kernel32.dll!BaseThreadInitThunk() (Unknown Source:0)
ntdll.dll!RtlUserThreadStart() (Unknown Source:0)

...so it's called when the process is shutdown (as expected), just very early, before ucrtbase calls all the "onexit" stuff (which includes static destruction).

Tweaking a few knobs in the project (build "lib" statically, or link against shared mimalloc) avoid the issue. Setting MI_USE_CXX to ON did not make a difference (probably b/c it already defaults to that on MSVC).

res2k commented 2 years ago

It's getting trickier. I tried to hack in what I outlined above ("perform the "thread cleanup" for the main thread as late as possible") and ... it still doesn't work. mi_process_done is still called before the static dtors. Hmmmm...

Hacked changes:

diff --git a/src/init.c b/src/init.c
index e06fa6b..cdea7d9 100644
--- a/src/init.c
+++ b/src/init.c
@@ -493,6 +493,9 @@ void mi_process_init(void) mi_attr_noexcept {
   #endif
   _mi_verbose_message("secure level: %d\n", MI_SECURE);
   mi_thread_init();
+#if defined(_WIN32) && !defined(MI_SHARED_LIB)
+  FlsSetValue(mi_fls_key, NULL);
+#endif
   mi_stats_reset();  // only call stat reset *after* thread init (or the heap tld == NULL)

   if (mi_option_is_enabled(mi_option_reserve_huge_os_pages)) {
@@ -522,6 +525,7 @@ static void mi_process_done(void) {
   process_done = true;

   #if defined(_WIN32) && !defined(MI_SHARED_LIB)
+  _mi_thread_done(_mi_heap_default);
   FlsSetValue(mi_fls_key, NULL);  // don't call main-thread callback
   FlsFree(mi_fls_key);            // call thread-done on all threads to prevent dangling callback pointer if statically linked with a DLL; Issue #208
   #endif
daanx commented 2 years ago

Thanks for posting a test project -- when I have time later I will give it a try.

Here is one other thing to try; in init.c, https://github.com/microsoft/mimalloc/blob/dev/src/init.c#L560, replace

#elif defined(__cplusplus)

by

#elif defined(xx__cplusplus)

so the static initialization code is triggered (using magic linker sections). This may get in early enough.

If that does not work, try to compile with -DMI_SHARED_LIB which will define DllMain for initialization and exit. This of course only works if you do not define this yourself as well :-)

res2k commented 2 years ago

Here is one other thing to try; in init.c, https://github.com/microsoft/mimalloc/blob/dev/src/init.c#L560, replace

#elif defined(__cplusplus)

by

#elif defined(xx__cplusplus)

so the static initialization code is triggered (using magic linker sections). This may get in early enough.

Yes, that works, as far as the destruction order is concerned. (I just found that out myself, but you were faster to comment ;)

So performing the "thread cleanup" for the main thread as late as possible is not sufficient, but necessary. But using C++ static initialization to trigger is not sufficient to control the initialization order, at least not in my dastardly case: The HeapWrapper ctor calls into mimalloc, which triggers the process initialization, which registers using atexit(); however, the registered function is later called before ~HeapWrapper...

But we may or may not be at the bottom of the rabbit hole yet; with everything above, and a working init/cleanup order, I now get an assertion:

lib.dll!_mi_assert_fail(const char * assertion, const char * fname, unsigned int line, const char * func) Line 361 (d:\sources\mimalloc-dtor-order\mimalloc\src\options.c:361)
lib.dll!_mi_mem_free(void * p, unsigned __int64 size, unsigned __int64 id, bool full_commit, bool any_reset, mi_os_tld_s * tld) Line 414 (d:\sources\mimalloc-dtor-order\mimalloc\src\region.c:414)
lib.dll!mi_segment_os_free(mi_segment_s * segment, unsigned __int64 segment_size, mi_segments_tld_s * tld) Line 484 (d:\sources\mimalloc-dtor-order\mimalloc\src\segment.c:484)
lib.dll!mi_segment_free(mi_segment_s * segment, bool force, mi_segments_tld_s * tld) Line 721 (d:\sources\mimalloc-dtor-order\mimalloc\src\segment.c:721)
lib.dll!_mi_segment_page_free(mi_page_s * page, bool force, mi_segments_tld_s * tld) Line 837 (d:\sources\mimalloc-dtor-order\mimalloc\src\segment.c:837)
lib.dll!_mi_page_free(mi_page_s * page, mi_page_queue_s * pq, bool force) Line 378 (d:\sources\mimalloc-dtor-order\mimalloc\src\page.c:378)
lib.dll!_mi_heap_collect_retired(mi_heap_s * heap, bool force) Line 434 (d:\sources\mimalloc-dtor-order\mimalloc\src\page.c:434)
lib.dll!mi_heap_collect_ex(mi_heap_s * heap, mi_collect_e collect) Line 147 (d:\sources\mimalloc-dtor-order\mimalloc\src\heap.c:147)
lib.dll!mi_heap_collect(mi_heap_s * heap, bool force) Line 166 (d:\sources\mimalloc-dtor-order\mimalloc\src\heap.c:166)
lib.dll!mi_collect(bool force) Line 170 (d:\sources\mimalloc-dtor-order\mimalloc\src\heap.c:170)
lib.dll!mi_process_done() Line 540 (d:\sources\mimalloc-dtor-order\mimalloc\src\init.c:540)
ucrtbased.dll!_execute_onexit_table::__l2::<lambda>() Line 206 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\onexit.cpp:206)
ucrtbased.dll!__crt_seh_guarded_call<int>::operator()<void <lambda>(void),int <lambda>(void) &,void <lambda>(void)>(__acrt_lock_and_call::__l2::void <lambda>(void) && setup, _execute_onexit_table::__l2::int <lambda>(void) & action, __acrt_lock_and_call::__l2::void <lambda>(void) && cleanup) Line 204 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\VCCRT\vcruntime\inc\internal_shared.h:204)
ucrtbased.dll!__acrt_lock_and_call<int <lambda>(void)>(const __acrt_lock_id lock_id, _execute_onexit_table::__l2::int <lambda>(void) && action) Line 974 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\inc\corecrt_internal.h:974)
ucrtbased.dll!_execute_onexit_table(_onexit_table_t * table) Line 231 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\onexit.cpp:231)
lib.dll!__scrt_dllmain_uninitialize_c() Line 399 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\utility\utility.cpp:399)
lib.dll!dllmain_crt_process_detach(const bool is_terminating) Line 182 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:182)
lib.dll!dllmain_crt_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 220 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:220)
lib.dll!dllmain_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 293 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293)
lib.dll!_DllMainCRTStartup(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 335 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:335)
ntdll.dll!LdrpCallInitRoutine() (Unknown Source:0)
ntdll.dll!LdrShutdownProcess() (Unknown Source:0)
ntdll.dll!RtlExitUserProcess() (Unknown Source:0)
kernel32.dll!ExitProcessImplementation() (Unknown Source:0)
ucrtbased.dll!exit_or_terminate_process(const unsigned int return_code) Line 144 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:144)
ucrtbased.dll!common_exit(const int return_code, const _crt_exit_cleanup_mode cleanup_mode, const _crt_exit_return_mode return_mode) Line 280 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:280)
ucrtbased.dll!exit(int return_code) Line 294 (c:\Users\res.foris\AppData\Local\Programs\Microsoft VS Code\minkernel\crts\ucrt\src\appcrt\startup\exit.cpp:294)
exe.exe!__scrt_common_main_seh() Line 297 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:297)
exe.exe!__scrt_common_main() Line 331 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331)
exe.exe!mainCRTStartup(void * __formal) Line 17 (d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17)
kernel32.dll!BaseThreadInitThunk() (Unknown Source:0)
ntdll.dll!RtlUserThreadStart() (Unknown Source:0)

It's not really obvious to me to what extent that is related to the init order stuff... well, more debugging, I guess.

res2k commented 2 years ago

It's not really obvious to me to what extent that is related to the init order stuff... well, more debugging, I guess.

Found the issue: the initialization for static mem_region_t regions[MI_REGION_MAX]; happens after the first allocation (via HeapWrapper) was completed. I guess b/c a mem_region_t contains a C++ class (std::atomic<>) MSVC is compelled to create an initialization function instead of just leaving everything zero-initialized. And/or due to this being a debug build the all zero initialization is optimized away...

res2k commented 2 years ago

Regions being initialized via a function can be forcibly prevented like this:

diff --git a/src/region.c b/src/region.c
index f864f73..f3287f1 100644
--- a/src/region.c
+++ b/src/region.c
@@ -98,7 +98,8 @@ typedef struct mem_region_s {
 } mem_region_t;

 // The region map
-static mem_region_t regions[MI_REGION_MAX];
+static void* regions_storage[sizeof(mem_region_t*) / sizeof(void*) * MI_REGION_MAX];
+static mem_region_t* regions = reinterpret_cast<mem_region_t*>(regions_storage);

 // Allocated regions
 static _Atomic(size_t) regions_count; // = 0;
res2k commented 2 years ago

Or, alternatively, "avoid C++ atomics":

diff --git a/include/mimalloc-atomic.h b/include/mimalloc-atomic.h
index e07df84..f78848a 100644
--- a/include/mimalloc-atomic.h
+++ b/include/mimalloc-atomic.h
@@ -17,7 +17,7 @@ terms of the MIT license. A copy of the license can be found in the file
 // instead of passing the memory order as a parameter.
 // -----------------------------------------------------------------------------------------------

-#if defined(__cplusplus)
+#if 0 // defined(__cplusplus)
 // Use C++ atomics
 #include <atomic>
 #define  _Atomic(tp)            std::atomic<tp>
@@ -71,7 +71,7 @@ static inline intptr_t mi_atomic_addi(_Atomic(intptr_t)*p, intptr_t add);
 static inline intptr_t mi_atomic_subi(_Atomic(intptr_t)*p, intptr_t sub);

-#if defined(__cplusplus) || !defined(_MSC_VER)
+#if 0/*defined(__cplusplus)*/ || !defined(_MSC_VER)

 // In C++/C11 atomics we have polymorphic atomics so can use the typed `ptr` variants (where `tp` is the type of atomic value)
 // We use these macros so we can provide a typed wrapper in MSVC in C compilation mode as well
daanx commented 2 years ago

ah, very interesting @res2k ; strange though as mi_page_empty in init.c also uses atomics. I guess the ATOMIC_VAR_INIT there prevents static initialization ?

Can you try with the dev-slice branch (v2.x) as that does not use regions and may be already working for this case.

I do not want to use the C atomics for msvc -- actually, I was hoping to get rid of that code in the future and always use C++ compilation on msvc so I hope we can find another workaround for this issue.

res2k commented 2 years ago

ah, very interesting @res2k ; strange though as mi_page_empty in init.c also uses atomics. I guess the ATOMIC_VAR_INIT there prevents static initialization ?

Huh interesting. Maybe ATOMIC_VAR_INIT does something different. Also quite possible: I simply just guessed wrong, and it's something else... Maybe, with some experimentation, I can find out what triggers the static initialization in one case but not the other. I'll look into dev-slice as well.

res2k commented 2 years ago

I did some experiments using compiler explorer: https://msvc.godbolt.org/z/fWT9hvGv9 Whether MSVC generates code for the static initialization (a "dynamic initializer for ...") seems to depend on whether an "uninitialized" member is present, but also the compiler version ... yay.

Curiously, what seems to make it work is to turn all members into atomics (including the "padding" one) - this appears to inhibit the "dynamic initializer" generation (see struct_with_atomic3)... Have to try that out with "real" mimalloc.

res2k commented 2 years ago

Some more experimentation results:

So... what do you think? Is there some PR material here? Or do think some more tests/checks/experiments are in order?

daanx commented 2 years ago

Hmm, scary changes :-) I think there are 3 parts (from looking at your late-cleanup branch):

  1. setting the fls key to null and explicitly cleaning up later (in a static build on msvc)
  2. using the magic msvc init section instead of c++ static constructor for initialization
  3. preventing the mem_region from using dynamic initialization due to atomics

I guess 2 is perhaps always good to ensure we initialize before anything else, but I am worried if we can already call at_exit at that time; I need to test more, also with a secure build.

Hope we can do 3 without modifying the mem_region too much?

I'll reply to 1 in your repo. Thanks

res2k commented 2 years ago

Hmm, scary changes :-) I think there are 3 parts (from looking at your late-cleanup branch):

1. setting the fls key to null and explicitly cleaning up later (in a static build on msvc)

Yes but not the FLS key, only the FLS value for the main thread. (Also see my reply to your reply in my repo.)

2. using the magic msvc init section instead of c++ static constructor for initialization

I guess 2 is perhaps always good to ensure we initialize before anything else, but I am worried if we can already call at_exit at that time; I need to test more, also with a secure build.

Yes, I think so too, initialization before anything else (certainly before static C++ objects) is a good idea.

One simple approach to test these changes would be to just create a PR and see what the build checks say. (I would need to remove the still failing "mimalloc static initialization order" tests though.)

If you happen to have some scenarios/cases in mind that worry you, let me know, I can try to cook up some unit tests for these.

3. preventing the mem_region from using dynamic initialization due to atomics

Hope we can do 3 without modifying the mem_region too much?

How much is too much? 🙃 The following change worked for me:

diff --git a/src/region.c b/src/region.c
index f864f73..348760d 100644
--- a/src/region.c
+++ b/src/region.c
@@ -94,7 +94,7 @@ typedef struct mem_region_s {
   mi_bitmap_field_t         commit;      // track if committed per block
   mi_bitmap_field_t         reset;       // track if reset per block
   _Atomic(size_t)           arena_memid; // if allocated from a (huge page) arena
-  size_t                    padding;     // round to 8 fields
+  _Atomic(size_t)           padding;     // round to 8 fields
 } mem_region_t;

 // The region map

I think it's not that bad - having an atomic "padding" member shouldn't have too much impact b/c it's not really used at runtime. Initialization cost is "plus one atomic" though, if that should matter. However, an atomic "padding" member is actually kind of weird, as it involves nonobvious voodoo to coerce the compiler into doing what we need. So this would need some commenting...

res2k commented 2 years ago

I am worried if we can already call at_exit at that time

Actually, I think we can - looking at the UCRT source code, the "atexit()" structures are set up very early (I think even before the "magic" MSVC init sections are handled), so it should be possible to call atexit() at that point.

I was initially wondering how that would look on other compilers/platforms, but then I remembered this should only affect MSVC 😄

I also created PR #515 to get the CI checks run, that looks good too (at least with the current set of tests).

res2k commented 2 years ago

The tests I made (https://github.com/res2k/mimalloc/commits/static-user-tests) pass now, so, looks good :) I'll try to check it with some "real" code later...

res2k commented 2 years ago

I'll try to check it with some "real" code later...

Did that - the crash is gone, hooray! So from my side I consider this issue fixed.

jasongibson commented 2 years ago

For what it's worth, the latest dev-slice also fixes a problem for non-Windows platforms (e.g. Linux). Using functions like mi_expand() on static data would crash in a recent prior version of the branch.

daanx commented 2 years ago

@jasongibson : Hi Jason and happy new year :-) Thanks to try this. Great to hear. Can you give the commit that you tried? Most things seemed Windows specific and I would like to understand how it could have impacted Linux as well.

jasongibson commented 2 years ago

Thanks! You too.

The commit that fixed the problem for me is 9f1b25e07d957f63622648dad66858d4d3170e2e.

I could create a test case for it if there's any value in it.

The prior commit 09c658dd401a9a4a7a2c807f6bd274d0420e109d fails like this:

(this may still be a valid very large allocation (over 64MiB))
mimalloc: error: mi_usable_size: pointer does not point to a valid heap space: 0x15be1e0
mimalloc: assertion failed: at "mem/mimalloc/mimalloc-internal.h":423, _mi_segment_page_idx_of
  assertion: "segment->page_kind <= MI_PAGE_MEDIUM || idx == 0"

at 0x5843E97: raise (raise.c:51)
by 0x5845800: abort (abort.c:79)
by 0xD6E51E: _mi_assert_fail(char const*, char const*, unsigned int, char const*) (options.c:345)
by 0xD6F18C: _mi_segment_page_idx_of(mi_segment_s const*, void const*) [clone .part.38] (mimalloc-internal.h:417)
by 0xD6F1C3: _mi_segment_page_idx_of(mi_segment_s const*, void const*) (mimalloc-internal.h:423)
by 0xD7ACCC: _mi_segment_page_of (mimalloc-internal.h:429)
by 0xD7ACCC: _mi_usable_size(void const*, char const*) (alloc.c:536)

Edit: Oh, I misread the issue title as 'static constructors', which is where the problem I described was happening, not destructors.