We should add a build runner cluster with suitably large machines configured with caching layers so we can run this workflow more regularly - ideally on every commit (pull_request and push events).
Details:
32 cores may be sufficient, but 64 or more would help. We should run some build time experiments to see.
Napkin math for budgeting: CI load is about 50-100 pull_request events per day, and jobs take 10-30 minutes.
We used to use larger GitHub-hosted runners when we were on a GitHub enterprise plan. With 64 cores, these could build the project in 10-20 minutes, but they also took 4-10 minutes just to check out the repository (something like 4x longer than the "standard" runners for a network/disk limited task).
We can continue to use standard GitHub-hosted runners for smaller jobs like runtime builds/tests/releases, and in Python projects like iree-turbine that don't need to build the LLVM/MLIR-based compiler binaries.
We currently run the
.github/workflows/ci_windows_x64_msvc.yml
workflow on a nightly schedule using standard GitHub-hosted runners (currentlywindows-2022
with 4 CPU cores, 16 GB of RAM, and 14GB of SSD). Looking at the workflow history, this is taking around 4h30m each run, which is far too slow to run onpull_request
or evenpush
events.We should add a build runner cluster with suitably large machines configured with caching layers so we can run this workflow more regularly - ideally on every commit (
pull_request
andpush
events).Details:
pull_request
events per day, and jobs take 10-30 minutes.Other considerations: