Proposal: Improve CI runtime

flub commented 9 months ago

The build-and-test jobs are taking more than 10 minutes. Here some proposals to speed this up:

Configure sccache on the self-hosted runners:

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.
Reduce the number of steps:

We do not need a separate check step, it achieves nothing. Instead run test straight away.
Split default-features and no-features into separate jobs

Because these jobs are compiling iroh with different feature flags we can parallelise these more by moving the tests to two jobs: test-default-features, test-no-features. Only worth it if we have enough runners. Maybe we should get more runners if that's needed.
Switch to cargo-nextest

Arqu commented 9 months ago

Yeah we've discussed this a couple weeks back. Think this will need some testing, but correctly configuring sccache should get it to work. Some caveats to the above proposal:

sccache is not installed on the windows boxes afaik rn (and on the x86_64 mac runner) - I can remedy this
we generally want a system configured sccache for this to work propperly but might be worth just setting the env vars for sccache, might be enough
the check step is necessary cause it does the full matrix of features individually which doesn't suffer from feature unifying when mashing them together (it does add about 3-4 min to the total runtime sadly)
doing cargo nextest alone will not automatically speed things up, but as discussed will allow us to start splitting jobs up further to do them in parallel by testing in chunks
splitting into more jobs brings a little bit of overhead to each job (caches aren't shared) which would be perfectly fine given we have enough nodes running, but we don't so all the parallel jobs will just get scheduled sequentially + overhead for now, we're already maxing out runners and cross and regular builds wait for each other in a single PR
release jobs take a lot longer anyways since they run builds with LTO which adds like 10 min to the whole thing out of the box and you can't split it off further, which hogs runners when merging a lot to main.

Either way I wholly support the cause.

sccache will help regardless (given proper setup)
we need more runners
nextest is a good stepping stone

flub commented 9 months ago

Yeah we've discussed this a couple weeks back. Think this will need some testing, but correctly configuring sccache should get it to work. Some caveats to the above proposal:

sccache is not installed on the windows boxes afaik rn (and on the x86_64 mac runner) - I can remedy this

I think we can keep using the action to install it. Maybe I'm wrong but I'll try it.

we generally want a system configured sccache for this to work propperly but might be worth just setting the env vars for sccache, might be enough

Same answer

the check step is necessary cause it does the full matrix of features individually which doesn't suffer from feature unifying when mashing them together (it does add about 3-4 min to the total runtime sadly)

oh, it runs a check job separately for each package in the workspace. Does this really make a difference to the features enabled for each crate on the test runs later?

A summary of all the crates with features and their deps on each other:

iroh

default: cli, metrics
cli: quic-rpc/quinn-transport, tokio/rt-multi-thread, clap, config, console, dirs-next, indicatif, multibase, tracing-subscriber, flat-db, shell-words, shellexpand, rustyline, clolored, toml, human-time, comfy-table, dialoguer, time
metrics: iroh-metrics
flat-db: iroh-bytes/flat-db
test:

iroh-base

default: hash, base32m
hash: bao-tree, multibase, data-encoding, postcard
base32: data-encoding

iroh-bytes

default: flat-db
flat-db: reflink-copy

iroh-gossip

default: net
net: iroh-net, futures, iroh-net, quinn, tokio, tokio-util

iroh-metrics

default: metrics
metrics: prometheus-client

iroh-net

default: metrics
derper: (non-iroh deps)
metrics: iroh-metrics/metrics

iroh-sync

default: net, fs-store, metrics
net: iorh-net, (non-iroh deps)
fs-store: (non-iroh deps)
metrics: iroh-metrics

from this i think the --no-default-features run will not change between doing this per-package or on the workspace.

For default features this is also true, since no default feature on one crate enables a non-default feature in another crate.

Finally reasoning through the --all-features case we also end up with the same deps in each crate when we do it per-crate or for workspace.

So, as far as I understand this would not change things? I had to write this down to get to that conclusion though.

update: I don't think this analysis is necessarily fully correct. It really depends on all the dependencies. And really we should check these combinations automatically. In the meantime I'm not sure how much changing this will affect things. I still suspect we'll catch most issues without having the duplicate cargo check runs, though could be convinced to leave them for now.

doing cargo nextest alone will not automatically speed things up, but as discussed will allow us to start splitting jobs up further to do them in parallel by testing in chunks

Running nextest does speed things up on my local machine, even without any other changes. Unless the runners are single-core machines I expect this to also speed up on CI? Did I get this wrong?

splitting into more jobs brings a little bit of overhead to each job (caches aren't shared) which would be perfectly fine given we have enough nodes running, but we don't so all the parallel jobs will just get scheduled sequentially + overhead for now, we're already maxing out runners and cross and regular builds wait for each other in a single PR

Once we have a working sccache setup the caches for all the dependencies will be shared. Crucially IIUC currently the target directory stomps over itself between the different feature combination runs. By strategically splitting up the steps into jobs so each job sticks to one feature combination we avoid that stomping over each other in the target directory while getting cache benefits of the deps still via sccache.

At least that is my understanding currently.

release jobs take a lot longer anyways since they run builds with LTO which adds like 10 min to the whole thing out of the box and you can't split it off further, which hogs runners when merging a lot to main.

I had not yet looked at release builds, I only looked at improving PR iteration in this issue. But we can totally improve release builds using our learnings from this here if this all works out.

link2xt commented 9 months ago

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

Kind of defeating the purpose of sccache which is a Shared Cache. I am currently looking into running a WebDAV storage using nginx for Rust compilation cache for myself (https://github.com/mozilla/sccache/blob/main/docs/Webdav.md, https://bazel.build/remote/caching#nginx). Edit: tried it, does not help much, closed: https://github.com/deltachat/deltachat-core-rust/pull/5090

dignifiedquire commented 9 months ago

I still suspect we'll catch most issues without having the duplicate cargo check runs, though could be convinced to leave them for now.

you are welcome to change them for test calls, but we need for each feature and each crate to have an individual execution, there is no way around this. before we had this, I was regularly running into issues when doing releases, because feature unification hid a dependency/inclusion issue.

link2xt commented 9 months ago

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

There is an Earthly blog post about speeding up Rust compilation by using a local cache: https://earthly.dev/blog/incremental-rust-builds/ Main idea is to use Rust local caching with persistent build runners and not try to upload/download the cache anywhere at all. Reduction from 22.5 minutes to 2.5 minutes looks strange, looks like baseline did not cache dependency builds at all: https://github.com/expressvpn/wolfssl-rs/pull/44 But otherwise the idea of not trying to upload/download Rust cache and simply keeping the target/ folder on the runner makes sense.

fabricedesre commented 9 months ago

Another option if you can set it up is to use a Redis cache with sccache. We had very good results with a single Redis and multiple build runners (on gitlab if that matters).

flub commented 9 months ago

We run on self-hosted runners but use SCCACHE_GHA_ENABLED. We should instead set a SCCACHE_CACHE_SIZE=80G and use the (default) local cache mechanism. This should mean that soon all runners will have a local cache with all dependencies and we'll spend no more time on compiling dependencies.

There is an Earthly blog post about speeding up Rust compilation by using a local cache: https://earthly.dev/blog/incremental-rust-builds/ Main idea is to use Rust local caching with persistent build runners and not try to upload/download the cache anywhere at all. Reduction from 22.5 minutes to 2.5 minutes looks strange, looks like baseline did not cache dependency builds at all: expressvpn/wolfssl-rs#44 But otherwise the idea of not trying to upload/download Rust cache and simply keeping the target/ folder on the runner makes sense.

The approach I suggest here (and implemented in https://github.com/n0-computer/iroh/pull/1865) takes this approach as well: it uses sccache with the (default) local storage. This means once each local runner has seen all our builds, they will all have a local cache of the dependencies available, significantly speeding up future builds. In the PR above this results in compilation time improvements across the board.

This takes a little longer to fill up the local caches than the shared cache like you tried with WebDAV, but if your number of projects is proportionally low to the number of runners this works well I think, and has less overhead than using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you). I think the local cache is a good solution for our current situation. We can improve with webdav or redis once we really need a shared cache.

flub commented 9 months ago

Another option if you can set it up is to use a Redis cache with sccache. We had very good results with a single Redis and multiple build runners (on gitlab if that matters).

@fabricedesre does redis force you to have all this in memory though? Or is there a way to run redis without storing it's entire dataset in memory?

fabricedesre commented 9 months ago

I'm pretty sure it's not a memory store, we have way more in the Redis cache than the available RAM on the redis instance :)

link2xt commented 9 months ago

using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you)

It works and stored 3.3 G on WebDAV already. But sccache does not add anything to https://github.com/Swatinem/rust-cache/, i.e. just storing a target/ folder in GitHub actions cache. The only problem is that GHA cache is maximum 10 G. cargo is better at caching than sccache because sccache only caches full compilations of dependencies (i.e. rustc invocations) while cargo has incremental compilation support. If you have a choice between 1) always losing target/ and using sccache for caching 2) using no sccache, but keep target/, second variant is going to win. Earthly simply mount the target/ and .cargo folders into build containers and use no sccache at all.

flub commented 9 months ago

using WebDAV (which is interesting, thanks for teaching me about it, but surprised you didn't find it worked for you)

It works and stored 3.3 G on WebDAV already. But sccache does not add anything to https://github.com/Swatinem/rust-cache/, i.e. just storing a target/ folder in GitHub actions cache. The only problem is that GHA cache is maximum 10 G. cargo is better at caching than sccache because sccache only caches full compilations of dependencies (i.e. rustc invocations) while cargo has incremental compilation support. If you have a choice between 1) always losing target/ and using sccache for caching 2) using no sccache, but keep target/, second variant is going to win. Earthly simply mount the target/ and .cargo folders into build containers and use no sccache at all.

makes sense, thanks for the explanation.

I was sort of considering incremental compilation to not be very interesting for CI times, but I guess it could have an impact on the iroh crates itself as PRs rarely touch everything.

I think the main reason sccache tends to be faster than rust-cache is that the latter has to download all the cache files from the last run up-front, including plenty of files that might not be used. While sccache only fetches cache hits.

n0-computer / iroh

Proposal: Improve CI runtime #1864