lqd / rustc-benchmarking-data

8 stars 0 forks source link

Benchmarks for executables #1

Open PoignardAzur opened 2 years ago

PoignardAzur commented 2 years ago

Hello!

I've really enjoyed reading through these benchmarks, and drawing conclusions about how code is structured. The multi-crate build times in particular really help you see how many crates are bottlenecked on proc_macro2 + quote + syn.

That being said, I think there's a flaw in the methodology you used: you've pulled the most popular crates by download count, which means you've got ecosystem crates, dependencies that are used by a lot of projects. Their compilation profile might not be representative of the experience of an end-user project: ecosystem crates will try to use only the dependencies they strictly need, whereas "leaf" crates will likely pull lots of dependencies to cover a wide range of feature.

For instance, the serde crate will depend only on serde-derive, which means when compiling serde with the derive feature, there will be a tigh proc_macro2 -> quote -> syn -> serde_derive -> serde critical path.

A CLI program might pull serde, and clap, and tokio, which muddles the critical path: after syn comes both serde_derive and clap_derive and tokio-macros.

So, it might be interesting to publish a similar benchmark with popular rust programs. Some suggestions:

I don't think I have the know-how or the resources to run those benchmarks myself, but I'd be super happy to help if I can, or if you can provide some mentoring.

Thanks again for your work!

lqd commented 2 years ago

Absolutely true, but we planned for this: we wanted a good mix of libraries and binaries (though the goal is to have a somewhat holistic view, 100% fidelity to the entire ecosystem is not a hard requirement per se). We don't have it yet but that'll be improved in future rounds. We've also started with the most popular crates because improvements would have broader reach, to everyone depending on them. Libraries are also easier to get automatically, binaries a bit less so (but there are some in the dataset, that I didn't get to profiling just yet because they also require manual tweaks for the perf collector) and your list is a great starting point, thanks.

In the future, I'll add some more crates from the ignore list (it contains actual failures, but also some of the bin targets, crates using build scripts, etc) that can be benchmarked without issues, and some binaries of course (and maybe some crates with a lot more dependencies than what we have). I didn't get to it yet in part because we do have quite a bit of actionable data already.

Running these benchmarks yourself is super easy (but can require building rustc locally), besides some of the setup mentioned in the readme here (and which will be automated with https://github.com/rust-lang/rustc-perf/pull/1183): most of these profiling rounds (cachegrind, dhat, self-profiling, time-passes) were done just by running the perf collector's local profiling, and that is fully documented here. The benchmarks directory there is our regular benchmark suite and it can provide examples on how to add benchmarks (there may be docs about this as well, I forget) .

Analysis of new data would be the most helpful I think (rather than just more raw data, say): anything strange or that stands out from the numbers and profiles we have here or perf.rlo's. Or things we've missed in our in-progress analysis though it's not fully done yet: there are a bunch of in-progress investigations about proc-macros, pipelining or the lack thereof, scheduling, build scripts, etc. Thanks!