optimize dep queue by recording and using crate build times

matthiaskrgr commented 5 years ago

Describe the problem you are trying to solve There are a couple of things in the dep queue that could be improved. Sometimes we and up waiting on a single dep where things could run partly in parallel if the dep queue was ordered differently.

Describe the solution you'd like If we knew how long a crate usually takes to build on the build host, we could estimates of how long parts of the dep graph take to build and make better decisions on what parts of the tree to start building first, or if it pays out to delay a package artificially in order to build it at a later stage to improve parallelism.

Perhaps we could save build times, compiler version used, build flags, crate metadata etc for each crate and save the data to a file inside the $CARGO_HOME and later read it to improve ordering of the dep queue.

Notes We probably need to make sure that the files does not grow too much and prune old entries from the crate-build-time database from time to time.

est31 commented 5 years ago

@matthiaskrgr good suggestion I've thought about this a little myself.

The problem that cargo has to solve is known as the multiprocessor DAG scheduling problem, which is NP complete so we can only apply approximations. On the bright side, the problem is well-known in the literature and you can find a wealth of papers about the topic. E.g. this one (doi link) is fairly recent, or this one (doi link).

The heuristic used currently in cargo is based on depth only and it already has a TODO that mentions possible extensions.

A project to find a better scheduling algorithm is a great opportunity for experimentation. One could e.g. parse crater logs to find out how long each crate takes to build and then use that information to build a "virtual" crater where the results of different scheduling algorithms are compared using different CPU numbers etc. Then we'd find the scheduling algorithm that fits the cargo use case best. The final implementation of the scheduling algorithm could be even put into its own crate, for use in the ecosystem, similar to polonius. It's possibly a great topic for an academic thesis if someone is looking for one.

est31 commented 5 years ago

cc also #5125 .

Eh2406 commented 5 years ago

@est31 that was very interesting reading. It was not clear to me how much the general problem gets simplified by the fact that Cargo has basically no weight/edge/communications costs. Also all of the algorithms assume that the cost of each process is (at least approximately) know in advance. So a deeper dive on which algorithm to use for Cargo, is somewhat premature until as the OP suggests we find some way to estimate the build time.

Eh2406 commented 5 years ago

At a meetup last night we were talking about this. We realized there's a tiny version of this that is still useful. If we record the full DependencyQueue into the -Ztinings report then we can add some useful statistics to that report. For example time to compile critical path, and approximate total compile times with different scheduling algorithms. If this report shows there are algorithms that could save time then we have strong evidence to encourage doing the design work to have cargo collect the data and use the algorithm.

matthiaskrgr commented 5 years ago

As for measuring the scheduling efficiency, could we display something like the average number of waiting edges over the entire time of the build process (higher number means more potential parallelism)?

est31 commented 2 years ago

Also all of the algorithms assume that the cost of each process is (at least approximately) know in advance.

Yeah there is no sure way to know the complexity of a unit without doing it, at which point that piece of data becomes useless. Therefore, one of the needed tasks would be to come up with heuristics to inform guesses on the complexity of the build. There is a large set of resources one can use:

line number count, or just tarball size, although that contains the tests and examples too. Ideally this can be done during extraction.
number/percentage of enabled features
number of dependencies
type of the unit (build.rs, etc)
data from prior builds on the same machine / data uploaded to crates.io by cargo during publish / data computed by docs.rs during that build that already happens

All these have correlations to the build time. Of course the prediction won't be 100% accurate. However, right now (unless there has been work I'm not aware of), the prediction cargo works with is that every unit has the same cost, so every crate has build cost of 1. Which means that coming up with a heuristic that's better than that should be pretty easy :). For the processing one could think about basic data science stuff like weighted sums, simple fully connected 2/3-layer neural networks, nothing fancy.

osiewicz commented 1 year ago

A project to find a better scheduling algorithm is a great opportunity for experimentation. One could e.g. parse crater logs to find out how long each crate takes to build and then use that information to build a "virtual" crater where the results of different scheduling algorithms are compared using different CPU numbers etc. Then we'd find the scheduling algorithm that fits the cargo use case best. The final implementation of the scheduling algorithm could be even put into its own crate, for use in the ecosystem, similar to polonius. It's possibly a great topic for an academic thesis if someone is looking for one.

Hey, I've recently worked on Dice_box, that I hope will serve as a playground for different scheduling algorithms. It relies on JSON timings and unit graph. It is still in the works - the output is not the prettiest and it would be cool to choose the algorithm to run via command-line - but the core idea is there and it works. I have already implemented pipelining simulation based on a separate rustc process which was a point of discussion in https://github.com/rust-lang/cargo/issues/6660. It also supports simulating builds with different thread counts - it does not (in theory, as things like https://github.com/rust-lang/cargo/issues/7689 happen) depend on the data to be captured with arbitrary parallelism level.

epage commented 1 year ago

At RustConf, we discussed this a little at the Unconf. Someone observed that they've seen the optimal order of packages be dependent on what platform they are build either on or for (not quite sure which).

epage commented 11 months ago

Also, #7437 would likely be a prerequisite, at least internally / unstable

rust-lang / cargo

optimize dep queue by recording and using crate build times #7396