Closed steve-marmalade closed 3 months ago
Hey @steve-marmalade thanks! the mix use of on-demand + spot is definitely one of the next steps we're planning. We'd love to learn more on your use case. for example, would the option of 1 on-demand + 1 spot be preferred over 2 on-demand in your case? the former gives you ~30% saving and some guarantee that the service is still UP.
Hi @infwinston , that's awesome to hear. Yes, I think the option of 1 on-demand + 1 spot would be preferred, for the reason that you mention.
To keep high availability and minimize cost, I imagine the algorithm could get pretty complex? For example, we could have only spot instances running (while they're available) and once we notice that they're being pre-empted quickly (or we are dropping below a threshold), we starting bringing up on-demand instances to compensate.
But building on your example, maybe a simpler version to start is to always have some number of on-demand nodes, and then use spot instances for cheap scalability. This ensures the service would never go down completely.
@infwinston The option of mixed on-demand and spot instances would be highly beneficial. Has there been any work on this? If not, I am open to contribute and help in any way I can
Yes, @MaoZiming is leading this development! would you mind sharing your use case and requirements?
Hi @infwinston -- The primary use case is to self-host 7B-34B models and give access to them via API endpoints (with some kind of authentication). Some good things to have would be the ability to reduce the cold start time to ~10-20s as done by Baseten and llm-engine, and give the ability to even scale to zero while ensure snappy autoscaling and high availability.
Hi @infwinston and @MaoZiming, just checking in to see if there have been any updates on this issue?
Hi, @abhimasand thanks for the interest and sorry for the delay. We are planning to add two additional fields in the SkyServe yaml to better support spot instances:
base_ondemand_fallback_replicas
specifies a base number of on-demand instances in addition to spot instances. dynamic_ondemand_fallback: true
allows SkyServe to replenish preempted spot instances with on-demand instances, and downscale on-demand instances when spot availability is back. Let us know if you have any feedback : )
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This has been fixed by #3194
Hey there, I recently saw #2528 and thought of an enhancement that I would personally find very valuable, so I figured I'd share it here.
The feature in that PR is to allow spot instances to serve inference requests. This is indeed a very useful feature for managing costs, but it's not something I could depend on in production, because the availability of spot instances can dry up suddenly if large compute jobs are launched.
What would be amazing, is to have a feature that says "if the number of instances is below a threshold, then spin up regular instances to maintain high availability". Once demand subsides, then these would be replaced with spot nodes.
In short: availability is the highest priority, followed by saving cost as a secondary concern.