Evaluate additional compiler optimizations like PGO

zamazan4ik commented 4 weeks ago

Hi!

Just read your article about the tool - I highly appreciate such a tool since I am also a person who is interested in a static analysis in various domains. Nice done! I think I have several potentially interesting ideas about improving this tool. I share them here since the Discussions are disabled for the repo.

Link-Time Optimization (LTO)

I noticed that in the Cargo.toml file Link-Time Optimization (LTO) for the project is not enabled. I suggest switching it on since it will reduce the binary size (always a good thing to have) and will likely improve the application's performance a bit.

I suggest enabling LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project since LTO consumes an additional amount of time to finish the compilation routine. If you think that a regular Release build should not be affected by such a change as well, then I suggest adding an additional dist or release-lto profile where additionally to regular release optimizations LTO will also be added. Such a change simplifies life for maintainers and others interested in the project persons who want to build the most performant version of the application. Via enabling LTO in the Cargo.toml file we can deliver an LTO-optimized version of the tool to users with cargo install. Using ThinLTO should also help to reduce the build-time overhead with LTO. E.g., check cargo-outdated Release profile.

Basically, it can be enabled with the following lines:

[profile.release]
lto = true

I have made quick tests (Fedora 40) by adding lto = true to the Release profile. The binary size reduction is from 7.6 Mib to 5.8 Mib. Maybe you also will be interested in tweaking other compiler options like codegen-units.

Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO)

According to my benchmarks, PGO measurably helps to optimize various applications (its CPU efficiency more precisely) in several domains, especially compilers, static analysis, linters, etc - see Clang, Rustc, clang-tidy, Rust Analyzer and many other examples. That's why I think applying PGO (and a similar PLO technique via LLVM BOLT) could help to optimize Zizmor further.

I see that the project is in its early stages so investing resources (time) into PGO and PLO things could not be worth it. However, when more functionality will be implemented and more "polishing" time will be available, PGO can definitely be useful.

Thank you.

woodruffw commented 4 weeks ago

Thanks for the detailed issue @zamazan4ik! I greatly appreciate it.

I'd be happy to merge a PR that enables LTO for release builds, if you're interested in sending one.

PGO and PLO are promising for future squeezing, but this tool is still in a pretty early stage and there's a lot of just plain inefficient stuff in it already (for example, it doesn't currently bother to cache and reuse the filtered lists of steps that it repeatedly loops over). So I'd be happy to start with just LTO, and then consider PGO/PLO further down the line.

woodruffw commented 4 weeks ago

Merged LTO in release builds with #81, so I've updated the issue title to reflect the remaining items.

zamazan4ik commented 4 weeks ago

Yep, thanks a lot!

carlocab commented 3 weeks ago

Just curious -- what kind of workloads allow you to see a difference from LTO/PGO? LTO can reduce binary size (curious by how much it did here too), not sure if PGO can do the same these days (but seems unlikely from principle).

zamazan4ik commented 3 weeks ago

Just curious -- what kind of workloads allow you to see a difference from LTO/PGO?

Actually, many of them benefits from PGO- see these. this list includes static analyzers. Some of the cases include LTO benchmarks too.

LTO can reduce binary size (curious by how much it did here too)

I have made quick tests (Fedora 40) by adding lto = true to the Release profile. The binary size reduction is from 7.6 Mib to 5.8 Mib.

... not sure if PGO can do the same these days (but seems unlikely from principle).

If you mean the binary size reduction, it actually "it depends" on the workload - for some cases the amount of inlining will be reduced -> the binary size will be smaller. Is it true in practice or not - it depends.

carlocab commented 3 weeks ago

Sorry, I was unclear -- what kind of workloads for zizmor allow you to see this difference?

I'm aware that for large applications it makes a substantial difference, but I was wondering specifically about this project.

zamazan4ik commented 3 weeks ago

I didn't perform PGO benchmarks for Zizmor yet - in the original post it was an idea without actual benchmarks

carlocab commented 3 weeks ago

Cool. Would be interested to know what kind of workflows it would be reading for LTO/PGO to actually make a noticeable performance difference.

woodruffw / zizmor

Evaluate additional compiler optimizations like PGO #80

Link-Time Optimization (LTO)

Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO)