Open zamazan4ik opened 4 weeks ago
Thanks for the detailed issue @zamazan4ik! I greatly appreciate it.
I'd be happy to merge a PR that enables LTO for release builds, if you're interested in sending one.
PGO and PLO are promising for future squeezing, but this tool is still in a pretty early stage and there's a lot of just plain inefficient stuff in it already (for example, it doesn't currently bother to cache and reuse the filtered lists of steps that it repeatedly loops over). So I'd be happy to start with just LTO, and then consider PGO/PLO further down the line.
Merged LTO in release builds with #81, so I've updated the issue title to reflect the remaining items.
Yep, thanks a lot!
Just curious -- what kind of workloads allow you to see a difference from LTO/PGO? LTO can reduce binary size (curious by how much it did here too), not sure if PGO can do the same these days (but seems unlikely from principle).
Just curious -- what kind of workloads allow you to see a difference from LTO/PGO?
Actually, many of them benefits from PGO- see these. this list includes static analyzers. Some of the cases include LTO benchmarks too.
LTO can reduce binary size (curious by how much it did here too)
I have made quick tests (Fedora 40) by adding lto = true to the Release profile. The binary size reduction is from 7.6 Mib to 5.8 Mib.
... not sure if PGO can do the same these days (but seems unlikely from principle).
If you mean the binary size reduction, it actually "it depends" on the workload - for some cases the amount of inlining will be reduced -> the binary size will be smaller. Is it true in practice or not - it depends.
Sorry, I was unclear -- what kind of workloads for zizmor allow you to see this difference?
I'm aware that for large applications it makes a substantial difference, but I was wondering specifically about this project.
I didn't perform PGO benchmarks for Zizmor yet - in the original post it was an idea without actual benchmarks
Cool. Would be interested to know what kind of workflows it would be reading for LTO/PGO to actually make a noticeable performance difference.
Hi!
Just read your article about the tool - I highly appreciate such a tool since I am also a person who is interested in a static analysis in various domains. Nice done! I think I have several potentially interesting ideas about improving this tool. I share them here since the Discussions are disabled for the repo.
Link-Time Optimization (LTO)
I noticed that in the
Cargo.toml
file Link-Time Optimization (LTO) for the project is not enabled. I suggest switching it on since it will reduce the binary size (always a good thing to have) and will likely improve the application's performance a bit.I suggest enabling LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project since LTO consumes an additional amount of time to finish the compilation routine. If you think that a regular Release build should not be affected by such a change as well, then I suggest adding an additional
dist
orrelease-lto
profile where additionally to regularrelease
optimizations LTO will also be added. Such a change simplifies life for maintainers and others interested in the project persons who want to build the most performant version of the application. Via enabling LTO in the Cargo.toml file we can deliver an LTO-optimized version of the tool to users withcargo install
. Using ThinLTO should also help to reduce the build-time overhead with LTO. E.g., checkcargo-outdated
Release profile.Basically, it can be enabled with the following lines:
I have made quick tests (Fedora 40) by adding
lto = true
to the Release profile. The binary size reduction is from 7.6 Mib to 5.8 Mib. Maybe you also will be interested in tweaking other compiler options likecodegen-units
.Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO)
According to my benchmarks, PGO measurably helps to optimize various applications (its CPU efficiency more precisely) in several domains, especially compilers, static analysis, linters, etc - see Clang, Rustc, clang-tidy, Rust Analyzer and many other examples. That's why I think applying PGO (and a similar PLO technique via LLVM BOLT) could help to optimize Zizmor further.
I see that the project is in its early stages so investing resources (time) into PGO and PLO things could not be worth it. However, when more functionality will be implemented and more "polishing" time will be available, PGO can definitely be useful.
Thank you.