microsoft / mimalloc

mimalloc is a compact general purpose allocator with excellent performance.
MIT License
10.63k stars 867 forks source link

Skip aligned allocation on SV39 MMUs #949

Open orlitzky opened 1 month ago

orlitzky commented 1 month ago

An attempt at fixing https://github.com/microsoft/mimalloc/issues/939

The SV39 detection works well enough with my sample size of 1. But I also had to add some extra #ifs to mi_os_prim_alloc_aligned() to avoid the pointless alloc/free. I don't have any 32-bit hardware any more, but this should speed things up there, too.

orlitzky commented 1 month ago

@microsoft-github-policy-service agree

kepstin commented 1 month ago

I've tested this on my Milk-V Jupiter system (SpacemiT M1 SoC), and I'm no longer seeing the warnings, as expected.

I'm wondering, tho, if this should be a runtime check rather than a build-time check. Are there other RISC-V MMUs where the >2TiB allocation would work? If so, this could cause an issue for distro packages which may be built on a machine with a different MMU but get run on a machine with sv39.

Edit: This build-time check definitely won't work when cross-compiling.

orlitzky commented 1 month ago

Hey, thanks for taking a look.

I know basically nothing about cross-compiling with CMake, but with autotools, you're supposed to override the (computed) settings for the build system with the values for the host system. In this case that's still possible because the setting is passed to the compiler (or preprocessor) via -DFOO. For example, if your build machine has an SV39 MMU and the host machine does not, you could append -UMI_SV39_MMU to your CFLAGS. Conversely, if the build machine does not have an SV39 MMU but the host does, -DMI_SV39_MMU=1. There could be a simpler way to handle this in CMake, though.

The binary distro thing is more of an issue. As a highly enlightened user of a source-based distro, this had not occurred to me :P

There are indeed other RISC-V MMU types where the hinting should be enabled, and distros will not be making special binaries targeted at each MMU. Runtime detection would be better in this scenario, but it also makes the value of this proposition a lot fuzzier. We're only avoiding an alloc/free with this detection -- it's not a huge savings. If we stop to parse /proc/cpuinfo at runtime, have we killed the intended benefit when there are only a small number of mallocs being made? There is a __riscv constant we could check to avoid stating /proc/cpuinfo on other architectures at least.

daanx commented 4 weeks ago

For now I have integrated the cmake check directly (since the other os.c modifications were not quite right -- hope this is ok :-)). I added the MI_NO_ALIGNED_HINT macro to disable aligned hinting in general. I would prefer a runtime check to enable better binary distributions but only if it does not add too much complexity... not sure if parsing /proc/cpuinfo is a bit too much at runtime... it would be nice to have a virtual_address_bits field in the mi_os_mem_config.t though.

orlitzky commented 4 weeks ago

For now I have integrated the cmake check directly (since the other os.c modifications were not quite right -- hope this is ok :-)).

Yep, fine by me. How wrong are the changes to os.c? (Is it something I should try to fix?)