openhpc / ohpc

OpenHPC Integration, Packaging, and Test Repo
http://openhpc.community
Apache License 2.0
840 stars 185 forks source link

Autoscaling #1319

Open bkmgit opened 3 years ago

bkmgit commented 3 years ago

Slurm supports autoscaling which is very helpful for cloud deployments. Might this be something that can be included and made relatively easy to configure?

ChrisDowning commented 3 years ago

Hi @bkmgit - we have a "Cloud Working Group" tackling this right now. The goal is to have a cloud equivalent of the current on-premises recipes, but skipping the un-needed parts (Warewulf/xCAT) and instead dealing with automated scale up/down of compute nodes, as well as other considerations (which instance types make sense, what storage to use, etc).

I'll drop another message here when there is something for you to try out.

bkmgit commented 3 years ago

@ChrisDowning Thanks. Mailing list may be a helpful thing to have as indicated at https://github.com/openhpc/cloudwg/issues/13 Hopefully community contributions on the design and implementation of the cloud equivalent will also be considered. This may possibly also be useful for HPC Carpentry https://github.com/hpc-carpentry/coordination/issues/42

sjpb commented 3 years ago

@ChrisDowning I'd definitely be interested in hearing what happens too. We have done proof-of-concept work on Slurm autoscaling using OpenHPC before, although it's not ready for production.

ChrisDowning commented 3 years ago

@sjpb Great - will keep you in the loop. I've deployed auto-scaling using Slurm power-saving for customers a few times over the last ~18 months, just never using the OpenHPC build. Deploying the same basic functionality using the OpenHPC Slurm package is pretty trivial, so we need to just get it documented first then move on to the "best practices" and other considerations people might not be aware of if they are new to cloud.