openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
391 stars 130 forks source link

Update to use m6i instances by default #502

Closed Innixma closed 1 year ago

Innixma commented 1 year ago

I propose that future benchmarks should swap from m5 to m6i AWS instances.

m6i instances are the next generation of CPU instances and offer superior performance for very similar price.

m6i.2xlarge Geekbench Result: https://browser.geekbench.com/v5/cpu/18064421 m5.2xlarge Geekbench Result: https://browser.geekbench.com/v5/cpu/17423989

These results showcase m6i single core performance is 43% faster than m5. Overall multi-core performance showcase m6i is 36% faster than m5.

My experiments show that AutoGluon trains ~40% faster on m6i than on m5. Notably, this also improves inference speed by a similar amount. I expect that these speedups will be similar for all frameworks, since it is a generic CPU speedup.

To match the compute of m5 for 1 hour, we only need to train for ~43 minutes on m6i.

At time of writing, on-demand price on US East 2 (Ohio) Region for m6i and m5 are identical:

Instance name | On-Demand hourly rate | vCPU | Memory | Storage | Network performance
-- | -- | -- | -- | -- | --
m5.2xlarge | $0.384 | 8 | 32 GiB | EBS Only | Up to 10 Gigabit
m6i.2xlarge | $0.384 | 8 | 32 GiB | EBS Only | Up to 12500 Megabit
-- | -- | -- | -- | -- | --

At time of writing, spot pricing on US East 2 (Ohio) are nearly identical:

m5.2xlarge | $0.0812 per Hour
m6i.2xlarge | $0.0858 per Hour

I think this points to m6i as being the cost efficient instance for future benchmarks.

PGijsbers commented 1 year ago

Things changed since October. At the moment, there is a considerable price difference in spot instance pricing (m5.2xlarge at 0.1047 and m6i.2xlarge at 0.1664) and this holds true for most regions (around ~50%). While this can be more or less counteracted by using reduced train times, if wall-clock time is the only benefit I would prefer to keep the same runtime and instance types as previous experiments, at least for our revision.

Innixma commented 1 year ago

Hey @PGijsbers, that is totally fair! I would say it would be good to keep an eye on this, as the value proposition could shift over time depending on outside demand. A basic expectation is that m6i.2xlarge is 43% faster than m5.2xlarge, so for them to break even, when m5.2x = $0.10/hr, then m6i.2x would be $0.143/hr. As you mention, this means m5.2x is the way to go for spot pricing at present (although not true for all regions as I note below).

The other benefit of keeping m5.2x is that we can compare directly with prior benchmark runs (such as the 2022 paper results), without having to worry about effective compute time differences.

Note: Europe (Stockholm) currently has m6i.2x at $0.1204/hr, which is a pretty good price.

This is maybe something you should decide directly before running the benchmark, since the prices seem to fluctuate significantly between regions.

PGijsbers commented 1 year ago

I will close this PR. As it stands, which EC2 instance is right should be evaluated before running experiments. I don't see a good reason to change the default at this moment, though. The small benefit of slightly cheaper (or more) compute in Stockholm doesn't outweigh the other m5.2xlarge benefits at this moment, in my opinion (I also noticed that cheap region). It is something we would still consider in the future, and for new experiments I would recommend users to explore (if they don't want to directly compare to previously obtained results), but I don't see that as a reason to keep the PR open.