Closed ychin closed 8 months ago
Just to document some of the benchmarks that I have done here in case other people are curious. I saw Neovim turning on -O3 and LTO and there were also questions about the concrete effects it brings as it wasn't benchmarked (https://github.com/neovim/neovim/pull/23051). This is not a very scientific test but I made some benchmarks that I used to quickly see the results using different areas of Vim.
:%s/a/b
in it and quit, and observe how long it takes.defcompile
at the end of file to force a compile.make benchmark
which from what I can tell tests regex. I'm using the last number (the one that takes the longest).Testing was done on an M1 Max MacBook Pro / Xcode 15, and I tested -O2/-O3/-Os with/without -flto (link-time-optimization). "Perf %" is relative performance to base (-O2/no LTO), and higher is better (higher perf means it took less time to do). Just out of curiosity I ran similar tests on Linux/GCC and also Intel Mac with x86 and they all show similar results so not going to duplicate it here:
name | size (bytes) | size % | open (s) | open perf % | mkv (s) | mkv perf % | markdown (s) | markdown perf % | vim9compile (s) | vim9compile perf | benchmark (s) | benchmark perf% |
---|---|---|---|---|---|---|---|---|---|---|---|---|
vim_os | 3645056 | 92% | 11.07 | 94% | 8.31 | 97% | 4.88 | 91% | 8.63 | 94% | 6.41 | 83% |
vim_o2 | 3956864 | 100% | 10.36 | 100% | 8.07 | 100% | 4.43 | 100% | 8.12 | 100% | 5.34 | 100% |
vim_o3 | 4114752 | 104% | 10.22 | 101% | 7.64 | 106% | 4.38 | 101% | 7.87 | 103% | 5.02 | 107% |
vim_os_lto | 3416768 | 86% | 10.65 | 97% | 8.13 | 99% | 4.60 | 96% | 8.24 | 99% | 5.80 | 92% |
vim_o2_lto | 4360864 | 110% | 9.95 | 104% | 7.67 | 105% | 4.22 | 105% | 7.93 | 102% | 5.18 | 103% |
vim_o3_lto | 4491696 | 114% | 9.86 | 105% | 7.15 | 113% | 4.22 | 105% | 8.03 | 101% | 5.26 | 102% |
Generally -O3 and LTO both increase performance at a almost linear relationship with code size (probably due to better inlining). It's surprising in the "mkv" test case that we see 10+% performance increase when using both, which is a non-trivial improvement. Opening a small file and quit (a common use case for Vim) also shows 5% speed improvement. The other benchmarks show a more mixed results though, but usually it is still a minor improvement. -Os (optimize for binary size) is clearly the worst though (surprisingly -Os + LTO results in smaller code size and better performance). Given Vim is small program, there's really no need to optimize binary size. I did not benchmark memory use though, but I think it was good enough results to convince me to turn on both O3 and LTO.
Previously #1314 tried to do this but I didn't have time to do a full benchmark to gauge the benefits. Also, I ran into Makefile dependency issues that resulted in really long build time, which vim/vim#13344 fixed.
From testing and benchmarking, it appears that both result in a measurable improvement in performance, wtih some benchmarks showing 10% faster (when opening a large 400 MB binary file and searching-and-replacing within it). Use them when building a published build. Don't do it for legacy builds as I encountered some issues with it failing tests when testing for recursion limit and I suspect it's due to stack size issues. Since legacy builds are mostly kept for compatibility reasons, no need to optimize it for now.