stanfordnmbl / opencap-api

Apache License 2.0
5 stars 7 forks source link

Scaling backend machines #109

Closed suhlrich closed 6 months ago

suhlrich commented 1 year ago

We'd like to have surge GPU capacity using AWS auto-scaling. We will have base capacity that is always running, so this will only activate if the queue is a certain length.

@olehkorkh-planeks @sashasimkin @antoinefalisse please read over and update this. We can ignore the two

antoinefalisse commented 1 year ago

Copying info from issue https://github.com/stanfordnmbl/opencap-api/issues/56

opencap-core is currently running on local workstations and one EC2 instance on aws. In the long run, we might want to move everything to aws (or another cluster) and optimize usage based on demand. Indeed, workers are currently always active waiting for jobs. Ideally, workers would be activated when there is demand and the number of workers would automatically scale as a function of demand. It is likely that we can move the entire infrastructure to aws at the same costs.

Random thoughts:

Copying from core:

Scaling rule:

if nInstances == 0 and len(queue) > 0 ---> nInstances = 1
if len(queue)) > 5 and nInstances >1 ---> nInstances += 1

-evaluate scaling rule in API every so often (cron job?) -Increase number of instances if needed, can use boto3 cmds -to scale down, could set a "haven't pulled a trial in xx min" in app.py, and shut down/stop an instance. should be able to run this in the docker?

antoinefalisse commented 8 months ago

Notes from call with AWS:

For the architecture we reviewed together I’d recommend decoupling the container from the queue solution:

Few resources to help you get started:

suhlrich commented 6 months ago

Deprecated. Moved here: https://github.com/stanfordnmbl/opencap-api/issues/174