Open jmather-sesi opened 3 months ago
One difference between Linux and Windows is that on Linux MIMALLOC_ARENA_EAGER_COMMIT
is enabled (as Linux allows overcommit and it is "fine" to use more virtual memory as it's free). That means mimalloc will commit 1GiB at a time for each arena (on Linux, "commit" means it has read/write access PROT_READ|PROT_WRITE
instead of reserved but no access PROT_NONE
). This may be the cause for the difference you see -- can you try running mimalloc with MIMALLOC_ARENA_EAGER_COMMIT=0
and see if that reduces the vsize
?
Now, having said that, it seems you see the difference also between v1.8.7 and v2.1.7 and each of those use the same arena implementation (only the handling of (thread local) segments in segment.c
differs). I am surprised by this and I wonder if there is something going on.
MIMALLOC_ARENA_EAGER_COMMIT=0
and see what happens?vsize
measure exactly? Is it all reserved virtual memory, or only virtual memory that is committed (i.e. not PROT_NONE
) (but may still be untouched) ?Thanks!
Hi Daan,
I just tried setting MIMALLOC_ARENA_EAGER_COMMIT=0 and ran the test that performs the worst (on Linux). Unfortunately it didn't seem to have any effect.
Here is the vsize and rss graph over time for multiple allocators. As you can see, 2.1.7 uses the most vsize by far:
When exporting MIMALLOC_ARENA_EAGER_COMMIT=0 and re-running the test with just 2.1.7, the results are pretty much identical:
Regarding vsize and rss, the script that I'm using to generate these graphs (https://github.com/jeetsukumaran/Syrupy) simply tracks the output of the rss and vsz fields from ps
over time. On my machine, man ps
states:
vsz VSZ virtual memory size of the process in KiB
(1024-byte units). Device mappings are currently
excluded; this is subject to change. (alias
vsize).
rss RSS resident set size, the non-swapped physical
memory that a task has used (in kilobytes).
(alias rssize, rsz).
Here are the stats for each run:
1.8.7
--------------------------------------------------------------------------------------------------
heap stats: peak total freed current unit count
reserved: 5.0 GiB 5.0 GiB 0 5.0 GiB
committed: 1.8 GiB 5.0 GiB 50.4 GiB -45.3 GiB ok
reset: 13.4 MiB
purged: 42.4 GiB
touched: 0 0 141.3 GiB -141.3 GiB ok
segments: 460 28.6 Ki 28.2 Ki 438 not all freed
-abandoned: 1 1 0 1 not all freed
-cached: 0 0 0 0 ok
pages: 0 0 471.3 Ki -471.3 Ki ok
-abandoned: 3 3 0 3 not all freed
-extended: 0
-noretire: 0
arenas: 5
-crossover: 0
-rollback: 0
mmaps: 0
commits: 0
resets: 6
purges: 98.1 Ki
threads: 28 28 2 26 not all freed
searches: 0.0 avg
numa nodes: 1
elapsed: 464.020 s
process: user: 1999.568 s, system: 117.681 s, faults: 65, rss: 4.9 GiB, commit: 1.8 GiB
2.1.7
--------------------------------------------------------------------------------------------------
heap stats: peak total freed current unit count
reserved: 10.1 GiB 12.8 GiB 2.8 GiB 10.0 GiB
committed: 2.1 GiB 12.8 GiB 145.1 GiB -132.2 GiB ok
reset: 0
purged: 83.0 GiB
touched: 192.7 KiB 129.9 MiB 185.9 GiB -185.8 GiB ok
segments: 70 2.0 Ki 1.9 Ki 65 not all freed
-abandoned: 1 1 0 1 not all freed
-cached: 0 0 0 0 ok
pages: 0 0 573.6 Ki -573.6 Ki ok
-abandoned: 3 3 0 3 not all freed
-extended: 0
-noretire: 0
arenas: 9
-crossover: 0
-rollback: 0
mmaps: 0
commits: 0
resets: 0
purges: 33.7 Ki
threads: 26 26 2 24 not all freed
searches: 0.0 avg
numa nodes: 1
elapsed: 467.501 s
process: user: 2004.478 s, system: 120.140 s, faults: 20, rss: 4.4 GiB, commit: 2.1 GiB
2.1.7 - eager=0
--------------------------------------------------------------------------------------------------
heap stats: peak total freed current unit count
reserved: 10.1 GiB 12.8 GiB 2.8 GiB 10.0 GiB
committed: 634.5 MiB 7.1 GiB 151.2 GiB -144.1 GiB ok
reset: 0
purged: 85.7 GiB
touched: 192.7 KiB 132.4 MiB 186.2 GiB -186.1 GiB ok
segments: 69 2.0 Ki 2.0 Ki 65 not all freed
-abandoned: 1 1 0 1 not all freed
-cached: 0 0 0 0 ok
pages: 0 0 573.9 Ki -573.9 Ki ok
-abandoned: 3 3 0 3 not all freed
-extended: 0
-noretire: 0
arenas: 9
-crossover: 0
-rollback: 0
mmaps: 0
commits: 5.8 Ki
resets: 0
purges: 33.7 Ki
threads: 28 28 2 26 not all freed
searches: 0.0 avg
numa nodes: 1
elapsed: 465.997 s
process: user: 1998.885 s, system: 120.455 s, faults: 35, rss: 4.4 GiB, commit: 634.5 MiB
Thanks for your assistance!
Very interesting.. but that does look a bit unexpected -- not sure what is causing the big vsize difference between 1.8.7 and 2.1.7. Is there any way I can reproduce this? (this may be a mimalloc bug)
Hi Daan, we can definitely set you up with a way to reproduce this. May I contact you by email with instructions? I can get your email address from the git logs.
Yes, I would like to investigate this -- thanks! (either daan at microsoft.com
or effp.org
works, put "mimalloc" in the subject if you can) . It is best if I can locally build and test but an Ubuntu binary (with debug info) would probably also work as I could preload mimalloc. Thanks.
I've been running our software through our benchmark suite with both mimalloc 1.8.7 and mimalloc 2.1.7 and have noticed some differences in its behaviour across OSes. On Windows, 2.1.7 pretty much always comes out on top in speed, vsize and rss. However on Linux, it's a bit more complicated. In some tests, I've seen 2.1.7 consume up to 75% more virtual memory than 1.8.7. rss is generally the same between both versions.
I'm wondering if this has been seen before, and if there's anything we can do about it. Ideally we'd like to use the same allocator on both OSes as its less to maintain. We need to be cautious about our virtual memory size as some users monitor the vsize as a way to detect swapping, and will kill the process if vsize exceeds the amount of physical memory.