ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.11k stars 5.6k forks source link

[Autoscaler] AWS EC2 Fleet support #39789

Open mjrlee opened 1 year ago

mjrlee commented 1 year ago

Description

A best practice with spot instances is to specify many different instance types so that the user is more likely to have their request satisfied by some combination of instance types.

Currently we can specify multiple instance types, but it's not clear what happens when we do (https://github.com/ray-project/ray/issues/39788)

One option to handle this is to create an EC2 Fleet instead of requesting individual node types

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-fleet.html

This feature offloads all the logic about which instances to start to AWS. The autoscaler then only needs to change the target capacity of the fleet to match the number of CPUs requested.

Use case

No response

anyscalesam commented 12 months ago

@mjrlee we support something similar on Anyscale Platform. See here. Would that be what you're looking for?

cc @richardliaw

mjrlee commented 12 months ago

Not really, I'm looking for help launching spot instances reliably.

zakajd commented 10 months ago

@mjrlee Have you found a solution for this issue?