microsoft / mimalloc

mimalloc is a compact general purpose allocator with excellent performance.
MIT License
10.63k stars 867 forks source link

Unable to obtain aligned memory on RISC-V systems with an SV39 MMU #939

Open orlitzky opened 2 months ago

orlitzky commented 2 months ago

(moved from the comments on https://github.com/microsoft/mimalloc/issues/640)

RISC-V has several different memory layouts:

https://www.kernel.org/doc/html/latest/arch/riscv/vm-layout.html

With the SV39 layout, the user-addressable range ends at 256GiB, but when mimalloc tries to obtain an aligned chunk, it does so at 2TiB. As a result, the mmap() can fail to return an aligned chunk, and usually will. When that happens, a warning is raised, and mimalloc falls back to overallocation:

mimalloc: unable to directly request hinted aligned OS memory (error: 2 (0x02), size: 0x2000000 bytes, alignment: 0x2000000, hint address: 0x050ADE000000)
mimalloc: warning: unable to allocate aligned OS memory directly, fall back to over-allocation (size: 0x2000000 bytes, address: 0x003F81800000, alignment: 0x2000000, commit: 1)

I have verified this with a small program on a Milk-V Pioneer Box:

$ ./a.out 
mmaping 1GiB at 0x40000000 (1 GiB)... YES (0x40000000)
mmaping 1GiB at 0x80000000 (2 GiB)... YES (0x80000000)
...
mmaping 1GiB at 0x3f40000000 (253 GiB)... YES (0x3f40000000)
mmaping 1GiB at 0x3f80000000 (254 GiB)... NO (File exists)
mmaping 1GiB at 0x3fc0000000 (255 GiB)... NO (File exists)
mmaping 1GiB at 0x4000000000 (256 GiB)... NO (Out of memory)
mmaping 1GiB at 0x4040000000 (257 GiB)... NO (Out of memory)
mmaping 1GiB at 0x4080000000 (258 GiB)... NO (Out of memory)
mmaping 1GiB at 0x40c0000000 (259 GiB)... NO (Out of memory)
mmaping 1GiB at 0x4100000000 (260 GiB)... NO (Out of memory)
mmaping 1GiB at 0x4140000000 (261 GiB)... NO (Out of memory)
mmaping 1GiB at 0x4180000000 (262 GiB)... NO (Out of memory)
mmaping 1GiB at 0x41c0000000 (263 GiB)... NO (Out of memory)
mmaping 1GiB at 0x4200000000 (264 GiB)... NO (Out of memory)

The fallback still works, but we waste a lot of time trying to obtain aligned memory when we know it will fail with high probability. These machines are still rare, but probably not for long. Maybe there's a way to work around this? I've had some luck using the top 128GiB of my space for mmap, but I don't know how reliable that will be in general. If nothing else, I think it would be better to just overallocate on these machines?

And finally, is there a reliable way to detect the SV39 layout? A build flag would be an easy first step, but detecting it automatically would be nicer for end users. I have one of these and didn't know about the memory layout problem until now.

daanx commented 2 months ago

Ah, very interesting. Yes, that is definitely it. Hmm, the aligned hint is behind a #if MI_INTPTR_SIZE >= 8 -- you could add there a "not riscV" to disable hinting. Another option is a build flag that says how many bits the user addressable space is and only do hinting >= 48 bits. We could also start the hints at the 128GiB level which might work well in practice. However, in that case we need to adapt the secure code that randomizes the start address to use less bits (like 12) to stay in the 128 to 256 GiB region.

If you find a way to detect the address space bits (RV39) at build time (or run time) let me know :-)

orlitzky commented 2 months ago

It looks like the future on Linux is the RISC-V hardware probe interface that provides exactly what we need, RISCV_HWPROBE_KEY_HIGHEST_VIRT_ADDRESS. It's in the stable 6.6.x series, but not the older 6.1.x series that my vendor kernel is based on.

In the meantime,

$ grep '^mmu[[:space:]]*:[[:space:]]*sv39$' /proc/cpuinfo

should do the trick. That works in 6.1.x and up, unless they change the format of /proc/cpuinfo some day. The aligned hinting can then be skipped in src/os.c if grep succeeded.

orlitzky commented 2 months ago

I don't write much CMake, but this looks like a passable detection mechanism:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index bcfe91d8..20b22c09 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -343,6 +343,16 @@ if(MINGW)
   add_definitions(-D_WIN32_WINNT=0x600)
 endif()

+# Check /proc/cpuinfo for an SV39 MMU and define a constant if one is
+# found. We will want to skip the aligned hinting in that case.
+if (EXISTS /proc/cpuinfo)
+  file(STRINGS /proc/cpuinfo mi_sv39_mmu REGEX "^mmu[ \t]+:[ \t]+sv39$")
+  if (mi_sv39_mmu)
+    MESSAGE( STATUS "SV39 MMU detected" )
+    list(APPEND mi_defines MI_SV39_MMU=1)
+  endif()
+endif()
+
 # extra needed libraries

 # we prefer -l<lib> test over `find_library` as sometimes core libraries

But now I notice that an aligned allocation is still attempted (and then freed, usually) even when we don't have a hint.

If _mi_os_get_aligned_hint() is ifdef'd out in src/os.c, we still do...

static void* mi_os_prim_alloc_aligned(size_t size, size_t alignment, bool commit, bool allow_large, bool* is_large, bool* is_zero, void** base, mi_stats_t* stats) {
  ...
  // try first with a hint (this will be aligned directly on Win 10+ or BSD)         
  void* p = mi_os_prim_alloc(size, alignment, commit, allow_large, is_large, is_zero, stats);
  if (p == NULL) return NULL;

But the implementation of the UNIX prim alloc, at least, is to try the hint that is not going to work because's it's ifdef'd to NULL. So we still get a warning about falling back to over-allocation, and take the performance hit from having to free() the first allocation.

If I undestand correctly this should already be happening on 32-bit systems, since it's the non-64-bit branch that I'm trying to hijack for my own purposes here.