microsoft / mimalloc

mimalloc is a compact general purpose allocator with excellent performance.
MIT License
10.61k stars 866 forks source link

Performance differences between Windows and Linux #921

Open jmather-sesi opened 3 months ago

jmather-sesi commented 3 months ago

I've been running our software through our benchmark suite with both mimalloc 1.8.7 and mimalloc 2.1.7 and have noticed some differences in its behaviour across OSes. On Windows, 2.1.7 pretty much always comes out on top in speed, vsize and rss. However on Linux, it's a bit more complicated. In some tests, I've seen 2.1.7 consume up to 75% more virtual memory than 1.8.7. rss is generally the same between both versions.

I'm wondering if this has been seen before, and if there's anything we can do about it. Ideally we'd like to use the same allocator on both OSes as its less to maintain. We need to be cautious about our virtual memory size as some users monitor the vsize as a way to detect swapping, and will kill the process if vsize exceeds the amount of physical memory.

daanx commented 3 months ago

One difference between Linux and Windows is that on Linux MIMALLOC_ARENA_EAGER_COMMIT is enabled (as Linux allows overcommit and it is "fine" to use more virtual memory as it's free). That means mimalloc will commit 1GiB at a time for each arena (on Linux, "commit" means it has read/write access PROT_READ|PROT_WRITE instead of reserved but no access PROT_NONE). This may be the cause for the difference you see -- can you try running mimalloc with MIMALLOC_ARENA_EAGER_COMMIT=0 and see if that reduces the vsize ?

Now, having said that, it seems you see the difference also between v1.8.7 and v2.1.7 and each of those use the same arena implementation (only the handling of (thread local) segments in segment.c differs). I am surprised by this and I wonder if there is something going on.

Thanks!

jmather-sesi commented 3 months ago

Hi Daan,

I just tried setting MIMALLOC_ARENA_EAGER_COMMIT=0 and ran the test that performs the worst (on Linux). Unfortunately it didn't seem to have any effect.

Here is the vsize and rss graph over time for multiple allocators. As you can see, 2.1.7 uses the most vsize by far:

comparison

When exporting MIMALLOC_ARENA_EAGER_COMMIT=0 and re-running the test with just 2.1.7, the results are pretty much identical:

eager_comparison

Regarding vsize and rss, the script that I'm using to generate these graphs (https://github.com/jeetsukumaran/Syrupy) simply tracks the output of the rss and vsz fields from ps over time. On my machine, man ps states:

       vsz         VSZ       virtual memory size of the process in KiB
                             (1024-byte units).  Device mappings are currently
                             excluded; this is subject to change.  (alias
                             vsize).
       rss         RSS       resident set size, the non-swapped physical
                             memory that a task has used (in kilobytes).
                             (alias rssize, rsz).

Here are the stats for each run:

1.8.7
--------------------------------------------------------------------------------------------------
heap stats:     peak       total       freed     current        unit       count   
  reserved:     5.0 GiB     5.0 GiB     0           5.0 GiB                          
 committed:     1.8 GiB     5.0 GiB    50.4 GiB   -45.3 GiB                          ok
     reset:    13.4 MiB
    purged:    42.4 GiB
   touched:     0           0         141.3 GiB  -141.3 GiB                          ok
  segments:   460          28.6 Ki     28.2 Ki    438                                not all freed
-abandoned:     1           1           0           1                                not all freed
   -cached:     0           0           0           0                                ok
     pages:     0           0         471.3 Ki   -471.3 Ki                           ok
-abandoned:     3           3           0           3                                not all freed
 -extended:     0      
 -noretire:     0      
    arenas:     5      
-crossover:     0      
 -rollback:     0      
     mmaps:     0      
   commits:     0      
    resets:     6      
    purges:    98.1 Ki 
   threads:    28          28           2          26                                not all freed
  searches:     0.0 avg
numa nodes:     1
   elapsed:   464.020 s
   process: user: 1999.568 s, system: 117.681 s, faults: 65, rss: 4.9 GiB, commit: 1.8 GiB

2.1.7
--------------------------------------------------------------------------------------------------
heap stats:     peak       total       freed     current        unit       count   
  reserved:    10.1 GiB    12.8 GiB     2.8 GiB    10.0 GiB                          
 committed:     2.1 GiB    12.8 GiB   145.1 GiB  -132.2 GiB                          ok
     reset:     0      
    purged:    83.0 GiB
   touched:   192.7 KiB   129.9 MiB   185.9 GiB  -185.8 GiB                          ok
  segments:    70           2.0 Ki      1.9 Ki     65                                not all freed
-abandoned:     1           1           0           1                                not all freed
   -cached:     0           0           0           0                                ok
     pages:     0           0         573.6 Ki   -573.6 Ki                           ok
-abandoned:     3           3           0           3                                not all freed
 -extended:     0      
 -noretire:     0      
    arenas:     9      
-crossover:     0      
 -rollback:     0      
     mmaps:     0      
   commits:     0      
    resets:     0      
    purges:    33.7 Ki 
   threads:    26          26           2          24                                not all freed
  searches:     0.0 avg
numa nodes:     1
   elapsed:   467.501 s
   process: user: 2004.478 s, system: 120.140 s, faults: 20, rss: 4.4 GiB, commit: 2.1 GiB

2.1.7 - eager=0
--------------------------------------------------------------------------------------------------
heap stats:     peak       total       freed     current        unit       count   
  reserved:    10.1 GiB    12.8 GiB     2.8 GiB    10.0 GiB                          
 committed:   634.5 MiB     7.1 GiB   151.2 GiB  -144.1 GiB                          ok
     reset:     0      
    purged:    85.7 GiB
   touched:   192.7 KiB   132.4 MiB   186.2 GiB  -186.1 GiB                          ok
  segments:    69           2.0 Ki      2.0 Ki     65                                not all freed
-abandoned:     1           1           0           1                                not all freed
   -cached:     0           0           0           0                                ok
     pages:     0           0         573.9 Ki   -573.9 Ki                           ok
-abandoned:     3           3           0           3                                not all freed
 -extended:     0      
 -noretire:     0      
    arenas:     9      
-crossover:     0      
 -rollback:     0      
     mmaps:     0      
   commits:     5.8 Ki 
    resets:     0      
    purges:    33.7 Ki 
   threads:    28          28           2          26                                not all freed
  searches:     0.0 avg
numa nodes:     1
   elapsed:   465.997 s
   process: user: 1998.885 s, system: 120.455 s, faults: 35, rss: 4.4 GiB, commit: 634.5 MiB

Thanks for your assistance!

daanx commented 3 months ago

Very interesting.. but that does look a bit unexpected -- not sure what is causing the big vsize difference between 1.8.7 and 2.1.7. Is there any way I can reproduce this? (this may be a mimalloc bug)

jmather-sesi commented 3 months ago

Hi Daan, we can definitely set you up with a way to reproduce this. May I contact you by email with instructions? I can get your email address from the git logs.

daanx commented 3 months ago

Yes, I would like to investigate this -- thanks! (either daan at microsoft.com or effp.org works, put "mimalloc" in the subject if you can) . It is best if I can locally build and test but an Ubuntu binary (with debug info) would probably also work as I could preload mimalloc. Thanks.