Currently we are running both CPU and CUDA tests in the A100/A10G runners. This is a waste of resource.
In this PR, we separate CPU and CUDA tests with different runners. The CUDA tests still run on the GCP A100 runner, while the CPU tests runs on a CPU-only runner of linux.24xlarge.
Currently we are running both CPU and CUDA tests in the A100/A10G runners. This is a waste of resource. In this PR, we separate CPU and CUDA tests with different runners. The CUDA tests still run on the GCP A100 runner, while the CPU tests runs on a CPU-only runner of
linux.24xlarge
.Reduces the CI time from ~120m to ~90m.
Fixes https://github.com/pytorch/benchmark/issues/2135