Open esator opened 6 months ago
LTO is still quite slow on Windows because malloc is just too slow, replace the malloc implementation speeds it up by an order of magnitude, but this doesn't apply to MinGW. On Linux ThinLTO is even faster than non-LTO. The biggest problem is that most libraries always build a bunch of useless shared libraries and tests executables at the same time, which is a huge waste of resources, and things like ffmpeg do a bunch of pointless build tests during configure, which is also extremely wasteful. If these factors were eliminated, ThinLTO wouldn't be that much slower.
Also x264 needs
--enable-lto
since it has linking errors because by default forces-mstack-alignment=64
, but for ffmpeg and other libs it's-mstack-alignment=16
LTO sounds great - for example I've found vstudio svt-avt binares on the web are faster than the 'optimized' -march= binaries I've compiled with llvm.
It would be helpful if you'd post a list of libs needing -mstack-alignment, or have a patch for media-suite_compile.sh - otherwise everyone has to trial & error
The biggest problem is that most libraries always build a bunch of useless shared libraries and tests executables at the same time, which is a huge waste of resources, and things like ffmpeg do a bunch of pointless build tests during configure, which is also extremely wasteful.
Compiling a full mediasuite isn't exactly fast anyway, so it could be users' decision if they want to enable lto. I don't know how much effect it would have with llvm though.
I don't know how much effect it would have with llvm though.
I don't know how much effect it would have with llvm though.
Right, so it's probably good to limit lto to core encoder libs/binaries that would gain speed.
I just compiled x265 and svt-av1 with lto by adding the -C and -D args to the .sh, seems to have worked fine and didn't take ages - lucky me that I'm not using a multi-multicore cpu so the llvm malloc issue probably doesn't affect me that much.
Btw, here's a speed comparison for aom: https://www.reddit.com/r/AV1/comments/jmwepw/how_to_build_libaomav1_to_be_as_fast_as_possible/
After recent clang changes, now LTO is possible for clang with
-flto=thin
via custom_build_options and--enable-lto=thin
for ffmpeg Also x264 needs--enable-lto
since it has linking errors because by default forces-mstack-alignment=64
, but for ffmpeg and other libs it's-mstack-alignment=16
It would be nice to have some option to enable lto for clang, nowadays lto is quite common and compatible, also-flto=thin
is just a bit slower than normal compilation, also some libs and tools may require individual flags for lto (like-DSVT_AV1_LTO=ON
for svt-av1,-DENABLE_LTO
for x265, etc) At least as an experimental and unsupported option, because it might require more maintenance and have less compatibility