maciejhirsz / logos

Create ridiculously fast Lexers
https://logos.maciej.codes
Apache License 2.0
2.71k stars 105 forks source link

Profile-Guided Optimization (PGO) benchmark results #374

Open zamazan4ik opened 4 months ago

zamazan4ik commented 4 months ago

Hi!

Yesterday I read a post about Logos (I didn't know about the library before). Since the post states "Ridiculously fast" performance I came up with an idea to try to optimize the library performance with PGO (as I already did for many other applications - all the results are available here). I performed some tests and want to share the results.

Test environment

Benchmark

Built-in benchmarks are invoked with cargo bench --workspace --all-features. PGO instrumentation phase on benchmarks is done with cargo pgo bench -- --workspace --all-features. PGO optimization phase is done with cargo pgo optimize bench -- --workspace --all-features.

All PGO optimization steps are done with cargo-pgo tool.

Results

I got the following results:

At least in the provided by the project benchmarks, I see measurable performance improvements. I don't know how these benchmarks are helpful for real-life performance evaluation - I just believe the project maintainers in this case.

Possible further steps

I can suggest the following things to consider:

I will be happy to answer all your questions about PGO.

jeertmans commented 4 months ago

Hey @zamazan4ik, thank you for your message and comprehensive analysis!

I am new to PGO, but I guess this only optimizes binaries, not library code? How does it provide any meaningful information to improve the code?

I am asking that since Logos is a library, and PGO optimisation will likely be applied by library users, not us.

zamazan4ik commented 4 months ago

I am new to PGO, but I guess this only optimizes binaries, not library code?

Actually no - PGO works in the same way for binaries and library code. You can easily apply PGO for building a library (static/dynamic, it doesn't matter) even if you build the library separately from a binary. E.g. check the pydantic-core library and the corresponding PR: https://github.com/pydantic/pydantic-core/pull/741

How does it provide any meaningful information to improve the code?

PGO usually allows the compiler to make much more clever inlining decisions. So in theory you can compare two logos versions (without PGO and with PGO), try to figure out why PGOed version is faster, and then using these insights try to optimize the library code. In this case, you will get the performance boost without needing to integrate PGO into the build pipeline.

However, this way can be quite difficult to implement (because a lot of code needs to be analyzed). Since Logos is the library and you don't prepare any prebuilt binaries here - I can suggest at least writing somewhere in the documentation a note about using PGO for improving Logos performance. So Logos users will be aware of another additional way, how they can speed up their Logos-based applications.

jeertmans commented 4 months ago

Ok I got it, thanks! Generating PGO binaries seems a bit convoluted, but a tutorial might be interesting, especially if you notice improvements on examples like the JSON parser :-)

Actually, your link to pydantic-core's PGO process interested me a lot, but for another project ^^'

jeertmans commented 4 months ago

Labelling this as a good first issue, for the handbook.

As discussed above, this would be nice to conduct a small analysis of PGO optimization on the JSON parser example, compare performances, and document that in the book.

PGO optimization is quite well documented here: https://doc.rust-lang.org/rustc/profile-guided-optimization.html.