Scaling backend machines

suhlrich commented 1 year ago

We'd like to have surge GPU capacity using AWS auto-scaling. We will have base capacity that is always running, so this will only activate if the queue is a certain length.

[ ] create the autoscaling group with 5 machines (@antoinefalisse please add spec), default target is 0 @sashasimkin
[ ] add a variable in cloudwatch desired_asg_gpu_instances that will get updated by the celery queue check and checked by the auto-scaling rule. @sashasimkin
[ ] add celery task that checks number of trials and updates desired_asg_gpu_instances on cloudwatch: https://github.com/stanfordnmbl/opencap-api/issues/173 @olehkorkh-planeks
[ ] Pause GPU machine and remove scale-in protection: https://github.com/stanfordnmbl/opencap-core/issues/113. @suhlrich or @antoinefalisse and @sashasimkin
[ ] automatically start EC2 machine with opencap-core docker + IAM roles: https://github.com/stanfordnmbl/opencap-infrastructure/issues/14
[ ] create ASG scaling logic that gets desired_asg_gpu_instances from cloudwatch and spins up/down machines. Spun up machines should have scale-in protection.

@olehkorkh-planeks @sashasimkin @antoinefalisse please read over and update this. We can ignore the two

antoinefalisse commented 1 year ago

Copying info from issue https://github.com/stanfordnmbl/opencap-api/issues/56

opencap-core is currently running on local workstations and one EC2 instance on aws. In the long run, we might want to move everything to aws (or another cluster) and optimize usage based on demand. Indeed, workers are currently always active waiting for jobs. Ideally, workers would be activated when there is demand and the number of workers would automatically scale as a function of demand. It is likely that we can move the entire infrastructure to aws at the same costs.

Random thoughts:

We can turn on an EC2 using aws-cli aws ec2 start-instances --instance-ids <instance_id>
We should not only turn on, but also start the docker container. We don't leverage that now, but we are actually pushing the opencap-core image to ECR every time we commit to main
We already have autoscaling on ECS, but we don't use it.
We could leverage the trial state stopped to implement the logic

Copying from core:

Scaling rule:

if nInstances == 0 and len(queue) > 0 ---> nInstances = 1
if len(queue)) > 5 and nInstances >1 ---> nInstances += 1

-evaluate scaling rule in API every so often (cron job?) -Increase number of instances if needed, can use boto3 cmds -to scale down, could set a "haven't pulled a trial in xx min" in app.py, and shut down/stop an instance. should be able to run this in the docker?

antoinefalisse commented 8 months ago

Notes from call with AWS:

For the architecture we reviewed together I’d recommend decoupling the container from the queue solution:

Continue deploying container to ECR – remove the long-live part. Just receive one item from queue as input, process it and let the process die.
Create a ECS cluster using EC2 and Autoscaling group. Indeed, Fargate doesn’t currently support GPU but there is a roadmap item on it. Read more on cluster capacity management here for auto scaling group capacity providers.
Create a Task Definition for your container processing task.
Create a queue consumer that launches an ECS task to process one queue item. You’ll use the run standalone task for that.

Few resources to help you get started:

Read more on Amazon ECS here.
ECS full workshop here. This will have step-by-step instructions to launch cluster, run tasks, etc.

suhlrich commented 6 months ago

Deprecated. Moved here: https://github.com/stanfordnmbl/opencap-api/issues/174

stanfordnmbl / opencap-api

Scaling backend machines #109