m-ab-s / media-autobuild_suite

This Windows Batchscript helps setup a Mingw-w64 compiler environment for building ffmpeg and other media tools under Windows.
GNU General Public License v3.0
1.56k stars 266 forks source link

Request: [clang64] LTO #2669

Open esator opened 6 months ago

esator commented 6 months ago

After recent clang changes, now LTO is possible for clang with -flto=thin via custom_build_options and --enable-lto=thin for ffmpeg Also x264 needs --enable-lto since it has linking errors because by default forces -mstack-alignment=64, but for ffmpeg and other libs it's -mstack-alignment=16 It would be nice to have some option to enable lto for clang, nowadays lto is quite common and compatible, also -flto=thin is just a bit slower than normal compilation, also some libs and tools may require individual flags for lto (like -DSVT_AV1_LTO=ON for svt-av1, -DENABLE_LTO for x265, etc) At least as an experimental and unsupported option, because it might require more maintenance and have less compatibility

Andarwinux commented 6 months ago

LTO is still quite slow on Windows because malloc is just too slow, replace the malloc implementation speeds it up by an order of magnitude, but this doesn't apply to MinGW. On Linux ThinLTO is even faster than non-LTO. The biggest problem is that most libraries always build a bunch of useless shared libraries and tests executables at the same time, which is a huge waste of resources, and things like ffmpeg do a bunch of pointless build tests during configure, which is also extremely wasteful. If these factors were eliminated, ThinLTO wouldn't be that much slower.

gitoss commented 5 months ago

Also x264 needs --enable-lto since it has linking errors because by default forces -mstack-alignment=64, but for ffmpeg and other libs it's -mstack-alignment=16

LTO sounds great - for example I've found vstudio svt-avt binares on the web are faster than the 'optimized' -march= binaries I've compiled with llvm.

It would be helpful if you'd post a list of libs needing -mstack-alignment, or have a patch for media-suite_compile.sh - otherwise everyone has to trial & error

The biggest problem is that most libraries always build a bunch of useless shared libraries and tests executables at the same time, which is a huge waste of resources, and things like ffmpeg do a bunch of pointless build tests during configure, which is also extremely wasteful.

Compiling a full mediasuite isn't exactly fast anyway, so it could be users' decision if they want to enable lto. I don't know how much effect it would have with llvm though.

Andarwinux commented 5 months ago

I don't know how much effect it would have with llvm though.

see https://github.com/llvm/llvm-project/pull/91862

gitoss commented 5 months ago

I don't know how much effect it would have with llvm though.

see llvm/llvm-project#91862

Right, so it's probably good to limit lto to core encoder libs/binaries that would gain speed.

I just compiled x265 and svt-av1 with lto by adding the -C and -D args to the .sh, seems to have worked fine and didn't take ages - lucky me that I'm not using a multi-multicore cpu so the llvm malloc issue probably doesn't affect me that much.

Btw, here's a speed comparison for aom: https://www.reddit.com/r/AV1/comments/jmwepw/how_to_build_libaomav1_to_be_as_fast_as_possible/