Use -O3 and link-time-optimization for published builds

ychin commented 8 months ago

From testing and benchmarking, it appears that both result in a measurable improvement in performance, wtih some benchmarks showing 10% faster (when opening a large 400 MB binary file and searching-and-replacing within it). Use them when building a published build. Don't do it for legacy builds as I encountered some issues with it failing tests when testing for recursion limit and I suspect it's due to stack size issues. Since legacy builds are mostly kept for compatibility reasons, no need to optimize it for now.

ychin commented 8 months ago

Just to document some of the benchmarks that I have done here in case other people are curious. I saw Neovim turning on -O3 and LTO and there were also questions about the concrete effects it brings as it wasn't benchmarked (https://github.com/neovim/neovim/pull/23051). This is not a very scientific test but I made some benchmarks that I used to quickly see the results using different areas of Vim.

open: Open Vim's README and quit. Do it 100 times.
mkv: Open a ~400 MB binary video file, perform :%s/a/b in it and quit, and observe how long it takes.
markdown: Install vim-markdown plugin (which has a lot of vimscript in it), then open a large (400 kB) Markdown file and quit and see how long it takes.
vim9compile: Source a large file (just took the vim9 LSP plugin, duplicate the main file 670 times with renamed functions, resulting in ~23 MB worth on vim9script), and then add a defcompile at the end of file to force a compile.
benchmark: Builtin make benchmark which from what I can tell tests regex. I'm using the last number (the one that takes the longest).

Testing was done on an M1 Max MacBook Pro / Xcode 15, and I tested -O2/-O3/-Os with/without -flto (link-time-optimization). "Perf %" is relative performance to base (-O2/no LTO), and higher is better (higher perf means it took less time to do). Just out of curiosity I ran similar tests on Linux/GCC and also Intel Mac with x86 and they all show similar results so not going to duplicate it here:

name	size (bytes)	size %	open (s)	open perf %	mkv (s)	mkv perf %	markdown (s)	markdown perf %	vim9compile (s)	vim9compile perf	benchmark (s)	benchmark perf%
vim_os	3645056	92%	11.07	94%	8.31	97%	4.88	91%	8.63	94%	6.41	83%
vim_o2	3956864	100%	10.36	100%	8.07	100%	4.43	100%	8.12	100%	5.34	100%
vim_o3	4114752	104%	10.22	101%	7.64	106%	4.38	101%	7.87	103%	5.02	107%
vim_os_lto	3416768	86%	10.65	97%	8.13	99%	4.60	96%	8.24	99%	5.80	92%
vim_o2_lto	4360864	110%	9.95	104%	7.67	105%	4.22	105%	7.93	102%	5.18	103%
vim_o3_lto	4491696	114%	9.86	105%	7.15	113%	4.22	105%	8.03	101%	5.26	102%

Generally -O3 and LTO both increase performance at a almost linear relationship with code size (probably due to better inlining). It's surprising in the "mkv" test case that we see 10+% performance increase when using both, which is a non-trivial improvement. Opening a small file and quit (a common use case for Vim) also shows 5% speed improvement. The other benchmarks show a more mixed results though, but usually it is still a minor improvement. -Os (optimize for binary size) is clearly the worst though (surprisingly -Os + LTO results in smaller code size and better performance). Given Vim is small program, there's really no need to optimize binary size. I did not benchmark memory use though, but I think it was good enough results to convince me to turn on both O3 and LTO.

ychin commented 8 months ago

Previously #1314 tried to do this but I didn't have time to do a full benchmark to gauge the benefits. Also, I ran into Makefile dependency issues that resulted in really long build time, which vim/vim#13344 fixed.

macvim-dev / macvim

Use -O3 and link-time-optimization for published builds #1444