macvim-dev / macvim

Vim - the text editor - for macOS
https://macvim.org
Vim License
7.47k stars 680 forks source link

Use -O3 and link-time-optimization for published builds #1444

Closed ychin closed 8 months ago

ychin commented 8 months ago

From testing and benchmarking, it appears that both result in a measurable improvement in performance, wtih some benchmarks showing 10% faster (when opening a large 400 MB binary file and searching-and-replacing within it). Use them when building a published build. Don't do it for legacy builds as I encountered some issues with it failing tests when testing for recursion limit and I suspect it's due to stack size issues. Since legacy builds are mostly kept for compatibility reasons, no need to optimize it for now.

ychin commented 8 months ago

Just to document some of the benchmarks that I have done here in case other people are curious. I saw Neovim turning on -O3 and LTO and there were also questions about the concrete effects it brings as it wasn't benchmarked (https://github.com/neovim/neovim/pull/23051). This is not a very scientific test but I made some benchmarks that I used to quickly see the results using different areas of Vim.

  1. open: Open Vim's README and quit. Do it 100 times.
  2. mkv: Open a ~400 MB binary video file, perform :%s/a/b in it and quit, and observe how long it takes.
  3. markdown: Install vim-markdown plugin (which has a lot of vimscript in it), then open a large (400 kB) Markdown file and quit and see how long it takes.
  4. vim9compile: Source a large file (just took the vim9 LSP plugin, duplicate the main file 670 times with renamed functions, resulting in ~23 MB worth on vim9script), and then add a defcompile at the end of file to force a compile.
  5. benchmark: Builtin make benchmark which from what I can tell tests regex. I'm using the last number (the one that takes the longest).

Testing was done on an M1 Max MacBook Pro / Xcode 15, and I tested -O2/-O3/-Os with/without -flto (link-time-optimization). "Perf %" is relative performance to base (-O2/no LTO), and higher is better (higher perf means it took less time to do). Just out of curiosity I ran similar tests on Linux/GCC and also Intel Mac with x86 and they all show similar results so not going to duplicate it here:

name size (bytes) size % open (s) open perf % mkv (s) mkv perf % markdown (s) markdown perf % vim9compile (s) vim9compile perf benchmark (s) benchmark perf%
vim_os 3645056 92% 11.07 94% 8.31 97% 4.88 91% 8.63 94% 6.41 83%
vim_o2 3956864 100% 10.36 100% 8.07 100% 4.43 100% 8.12 100% 5.34 100%
vim_o3 4114752 104% 10.22 101% 7.64 106% 4.38 101% 7.87 103% 5.02 107%
vim_os_lto 3416768 86% 10.65 97% 8.13 99% 4.60 96% 8.24 99% 5.80 92%
vim_o2_lto 4360864 110% 9.95 104% 7.67 105% 4.22 105% 7.93 102% 5.18 103%
vim_o3_lto 4491696 114% 9.86 105% 7.15 113% 4.22 105% 8.03 101% 5.26 102%

Generally -O3 and LTO both increase performance at a almost linear relationship with code size (probably due to better inlining). It's surprising in the "mkv" test case that we see 10+% performance increase when using both, which is a non-trivial improvement. Opening a small file and quit (a common use case for Vim) also shows 5% speed improvement. The other benchmarks show a more mixed results though, but usually it is still a minor improvement. -Os (optimize for binary size) is clearly the worst though (surprisingly -Os + LTO results in smaller code size and better performance). Given Vim is small program, there's really no need to optimize binary size. I did not benchmark memory use though, but I think it was good enough results to convince me to turn on both O3 and LTO.

ychin commented 8 months ago

Previously #1314 tried to do this but I didn't have time to do a full benchmark to gauge the benefits. Also, I ran into Makefile dependency issues that resulted in really long build time, which vim/vim#13344 fixed.