Open zamazan4ik opened 5 months ago
Oh wow, that's a much bigger improvement than I was expecting. Thanks for looking into it!
I will try to find time to learn cargo-pgo
Enable LTO. I expect in general performance boost "for free" and the binary size reduction.
Yep, will do in the next release.
Perform more PGO benchmarks with other datasets
The only dataset available in CI is the resvg test suite.
And I will probably add build instructions with a PGO section.
Hi!
As was proposed here, I decided to perform various tests with optimization
resvg
with more advanced compiler optimizations like LTO, PGO, PLO. Recently I tested Profile-Guided Optimization (PGO) compiler optimization on different projects in different software domains - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Here are my results for the project - I hope they will be helpful to someone.Test environment
resvg
version: the latest for now from themaster
branch on commit4b4e8970de29407e6257aac3d2f501b60e88236a
Benchmark
For benchmark purposes, I use a simple scenario of converting an SVG file to a PNG file with the
resvg input.svg output.png
command. For PGO optimization I use cargo-pgo tool. Release build is done withcargo build --release
, PGO instrumented -cargo pgo build
, PGO-optimized -cargo pgo optimize build
.taskset -c 0
is used for reducing the OS scheduler's influence on the results during all measurements. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).As an input file for the training purposes for the
resvg input.svg output.png
command, I use this file.Additionally, I decided to enable back LTO for the tool. You disabled this optimization nearly 5 years ago due to some compiler bugs. I guess during the last 5 years the LTO implementation in the compiler became much more stable, and we can consider enabling it once again. So, for
resvg
during the benchmarks I enabled it with the following addition to theCargo.toml
file:Post-Link Optimization is also done with
cargo-pgo
with the same training workload as for the PGO step.Results
Firstly, let's check the scenario when the training workload and the benchmark workload are the same. Such a benchmark is still useful for scenarios where you need to convert the same file many times (like a part of CI without caching):
where:
resvg_release
- regular Release buildresvg_release_lto
- Release + LTOresvg_lto_optimized
- Release + LTO + PGO optimizedresvg_lto_bolt_optimized
- Release + LTO + PGO optimized + BOLT optimizedAccording to the results, LTO and PGO measurably improve performance. However, BOLT didn't improve the situation too much.
What if training and benchmarking workloads are different files? For this, I used the same file for training as above but for the benchmarks, I use another file. Here we go:
We got a performance boost once again for a different file. I suppose it's because these two files execute similar paths inside the tool but cannot say more since I am not an SVG expert at all :)
However, there are cases that show that training on only one file is not sufficient - e.g. let's use this file for the benchmark (the training file remains the same as in the tests above):
Here we see some performance decrease from all optimizations (even from LTO that's strange). It shows that the training PGO set should be wider.
Just for reference, I also measured the tool slowdown during the PGO and PLO training phases:
where:
resvg_lto_instrumented
- Release + LTO + PGO instrumentationresvg_lto_bolt_instrumented
- Release + LTO + PGO optimization + BOLT instrumentationAlso, I want to report the binary size changes (without
strip
-ing that can influence the binary size a lot):Further steps
I can suggest the following action points:
resvg
's performance with PGO.I would be happy to answer your questions about PGO.
P.S. Please do not treat the issue like a bug or something like that - it's just a benchmark report. Since the "Discussions" functionality is disabled in this repo, I created the Issue instead.