rust-lang / rust-analyzer

A Rust compiler front-end for IDEs
https://rust-analyzer.github.io/
Apache License 2.0
14.07k stars 1.56k forks source link

Benchmark with different allocators #1441

Open matklad opened 5 years ago

matklad commented 5 years ago

The recent paper about https://github.com/microsoft/mimalloc sounds too good to be true.

It might be a good idea to compare different allocators to see if there are some memory usage wins to have. Better perf would also be helpful, but memory usage is the most important thing

Here's the couple of benchmarks that should be representative (you can use any other large project instead of chalk, for example, rust-analyzer itself):

cargo run --package ra_cli --release -- analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0
cargo run --package ra_cli --release -- analysis-stats ../chalk

I think /usr/bin/time could be used to compare both time and memory (rss)?

We need to compare at least:

csmoe commented 5 years ago

mimalloc vs jemalloc in rustc: https://github.com/rust-lang/rust/pull/62073

mattico commented 5 years ago

rustc 1.36.0 (a53f9df32 2019-07-03) rust-analyzer 35f28c538a9b9f461bb4db1a78d02e9f02a3d296

Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz 8 GiB RAM Ubuntu 18.10 Server X86_64

Self-reported times

## glibc 2.28 | test | run 1 | run 2 | run 3 | |---------------|---------------|---------------|--------------| |loading | 164.459389ms | 158.283251ms | 158.038681ms| |from scratch | 5.337737528s | 5.320671609s | 5.319580964s| |no change | 6.025861ms | 6.065039ms | 6.003961ms| |trivial change | 68.171291ms | 68.601428ms | 68.453403ms| |db loaded | 162.044899ms | 165.939177ms | 154.081518ms| |analysis | 15.262529965s | 15.364532676s | 15.265079964s| ## jemalloc | test | run 1 | run 2 | run 3 | |---------------|---------------|---------------|-------------| |loading | 166.110382ms | 134.700889ms | 153.745901ms| |from scratch | 5.05255001s | 5.05072627s | 5.052360284s| |no change | 5.499773ms | 5.546466ms | 5.518948ms| |trivial change | 63.893892ms | 65.056923ms | 63.803271ms| |db loaded | 154.996884ms | 140.215319ms | 162.413604ms| |analysis | 14.672433632s | 14.672782703s | 14.61266783s| ## mimalloc | test | run 1 | run 2 | run 3 | |---------------|---------------|---------------|-------------| |loading | 167.466927ms | 154.518565ms | 154.566493ms| |from scratch | 5.050844948s | 5.047876063s | 5.078473906s| |no change | 5.597231ms | 5.61053ms | 5.653994ms| |trivial change | 64.158532ms | 64.247269ms | 64.714673ms| |db loaded | 158.278461ms | 154.817976ms | 159.662227ms| |analysis | 14.971792094s | 14.966517565s | 14.880917377s|

`time` data

## glibc 2.28 Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0" User time (seconds): 5.59 System time (seconds): 0.16 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.75 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 382140 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 102574 Voluntary context switches: 1385 Involuntary context switches: 20 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Command being timed: "target/release/ra_cli analysis-stats ../chalk" User time (seconds): 15.53 System time (seconds): 0.32 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.87 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 763380 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 298423 Voluntary context switches: 1386 Involuntary context switches: 29 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ## jemalloc Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0" User time (seconds): 5.25 System time (seconds): 0.13 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.39 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 393884 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 103437 Voluntary context switches: 1368 Involuntary context switches: 40 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Command being timed: "target/release/ra_cli analysis-stats ../chalk" User time (seconds): 14.77 System time (seconds): 0.24 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.02 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 893204 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 232477 Voluntary context switches: 1365 Involuntary context switches: 122 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ## mimalloc Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0" User time (seconds): 5.14 System time (seconds): 0.22 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.41 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 490116 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 62 Minor (reclaiming a frame) page faults: 138332 Voluntary context switches: 1471 Involuntary context switches: 56 Swaps: 0 File system inputs: 19184 File system outputs: 800 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Command being timed: "target/release/ra_cli analysis-stats ../chalk" User time (seconds): 14.67 System time (seconds): 0.53 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.22 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1187624 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 521357 Voluntary context switches: 1367 Involuntary context switches: 103 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

time test glibc 2.28 jemalloc mimalloc
analysis-bench (s) 5.59 5.25 5.14
analysis-bench maxrss (MB) 382 394 490
analysis-bench trivial change (ms) 68.45 63.80 64.71
analysis-stats (s) 15.53 14.77 14.67
analysis-stats maxrss (MB) 763 893 1188
analysis-stats analysis (s) 15.3 14.7 15.0

Both allocators are significantly faster than glibc. jemalloc uses slightly more memory, while mimalloc uses significantly more memory than glibc. mimalloc has the fastest overall execution times but jemalloc has the fastest self-reported times, suggesting that mimalloc has less initialization overhead.

matklad commented 5 years ago

Thanks for those benchmarks @mattico!

It indeed seems like mimalloc is probably not a good choice at this time, due to high memory usage.

For system allocator/jemalloc we already have a feature flag. Performance wise, it looks like jemalloc is a win. However, it is a C library, so building jemalloc is not suuuper easy, so it makes sense to keep the status quo where jemalloc is opt-int

lnicola commented 5 months ago

We might want to revisit this, jemalloc and mimalloc bring the analysis-stats self time from 75.72 s to 72.04 and 71.02 s (my, we're a little slower these days). So still a 5%-ish improvement, but we can build it easily enough. And we can always revert if it causes problems.

lnicola commented 5 months ago

As for the memory usage:

GLIBC jemalloc mimalloc
time max RSS analysis-stats self 1801 MB 1752 MB 1868 MB

So jemalloc is both faster and uses less RAM.