Open matklad opened 5 years ago
mimalloc vs jemalloc in rustc: https://github.com/rust-lang/rust/pull/62073
rustc 1.36.0 (a53f9df32 2019-07-03) rust-analyzer 35f28c538a9b9f461bb4db1a78d02e9f02a3d296
Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz 8 GiB RAM Ubuntu 18.10 Server X86_64
## glibc 2.28 | test | run 1 | run 2 | run 3 | |---------------|---------------|---------------|--------------| |loading | 164.459389ms | 158.283251ms | 158.038681ms| |from scratch | 5.337737528s | 5.320671609s | 5.319580964s| |no change | 6.025861ms | 6.065039ms | 6.003961ms| |trivial change | 68.171291ms | 68.601428ms | 68.453403ms| |db loaded | 162.044899ms | 165.939177ms | 154.081518ms| |analysis | 15.262529965s | 15.364532676s | 15.265079964s| ## jemalloc | test | run 1 | run 2 | run 3 | |---------------|---------------|---------------|-------------| |loading | 166.110382ms | 134.700889ms | 153.745901ms| |from scratch | 5.05255001s | 5.05072627s | 5.052360284s| |no change | 5.499773ms | 5.546466ms | 5.518948ms| |trivial change | 63.893892ms | 65.056923ms | 63.803271ms| |db loaded | 154.996884ms | 140.215319ms | 162.413604ms| |analysis | 14.672433632s | 14.672782703s | 14.61266783s| ## mimalloc | test | run 1 | run 2 | run 3 | |---------------|---------------|---------------|-------------| |loading | 167.466927ms | 154.518565ms | 154.566493ms| |from scratch | 5.050844948s | 5.047876063s | 5.078473906s| |no change | 5.597231ms | 5.61053ms | 5.653994ms| |trivial change | 64.158532ms | 64.247269ms | 64.714673ms| |db loaded | 158.278461ms | 154.817976ms | 159.662227ms| |analysis | 14.971792094s | 14.966517565s | 14.880917377s|
## glibc 2.28 Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0" User time (seconds): 5.59 System time (seconds): 0.16 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.75 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 382140 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 102574 Voluntary context switches: 1385 Involuntary context switches: 20 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Command being timed: "target/release/ra_cli analysis-stats ../chalk" User time (seconds): 15.53 System time (seconds): 0.32 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.87 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 763380 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 298423 Voluntary context switches: 1386 Involuntary context switches: 29 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ## jemalloc Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0" User time (seconds): 5.25 System time (seconds): 0.13 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.39 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 393884 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 103437 Voluntary context switches: 1368 Involuntary context switches: 40 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Command being timed: "target/release/ra_cli analysis-stats ../chalk" User time (seconds): 14.77 System time (seconds): 0.24 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.02 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 893204 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 232477 Voluntary context switches: 1365 Involuntary context switches: 122 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ## mimalloc Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0" User time (seconds): 5.14 System time (seconds): 0.22 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.41 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 490116 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 62 Minor (reclaiming a frame) page faults: 138332 Voluntary context switches: 1471 Involuntary context switches: 56 Swaps: 0 File system inputs: 19184 File system outputs: 800 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Command being timed: "target/release/ra_cli analysis-stats ../chalk" User time (seconds): 14.67 System time (seconds): 0.53 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.22 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1187624 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 521357 Voluntary context switches: 1367 Involuntary context switches: 103 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
time test |
glibc 2.28 | jemalloc | mimalloc |
---|---|---|---|
analysis-bench (s) | 5.59 | 5.25 | 5.14 |
analysis-bench maxrss (MB) | 382 | 394 | 490 |
analysis-bench trivial change (ms) | 68.45 | 63.80 | 64.71 |
analysis-stats (s) | 15.53 | 14.77 | 14.67 |
analysis-stats maxrss (MB) | 763 | 893 | 1188 |
analysis-stats analysis (s) | 15.3 | 14.7 | 15.0 |
Both allocators are significantly faster than glibc. jemalloc uses slightly more memory, while mimalloc uses significantly more memory than glibc. mimalloc has the fastest overall execution times but jemalloc has the fastest self-reported times, suggesting that mimalloc has less initialization overhead.
Thanks for those benchmarks @mattico!
It indeed seems like mimalloc
is probably not a good choice at this time, due to high memory usage.
For system allocator/jemalloc we already have a feature flag. Performance wise, it looks like jemalloc is a win. However, it is a C library, so building jemalloc is not suuuper easy, so it makes sense to keep the status quo where jemalloc is opt-int
We might want to revisit this, jemalloc and mimalloc bring the analysis-stats self time from 75.72 s to 72.04 and 71.02 s (my, we're a little slower these days). So still a 5%-ish improvement, but we can build it easily enough. And we can always revert if it causes problems.
As for the memory usage:
GLIBC | jemalloc |
mimalloc |
|
---|---|---|---|
time max RSS analysis-stats self |
1801 MB | 1752 MB | 1868 MB |
So jemalloc
is both faster and uses less RAM.
The recent paper about https://github.com/microsoft/mimalloc sounds too good to be true.
It might be a good idea to compare different allocators to see if there are some memory usage wins to have. Better perf would also be helpful, but memory usage is the most important thing
Here's the couple of benchmarks that should be representative (you can use any other large project instead of chalk, for example, rust-analyzer itself):
I think
/usr/bin/time
could be used to compare both time and memory (rss)?We need to compare at least: