Closed suhlrich closed 6 months ago
Copying info from issue https://github.com/stanfordnmbl/opencap-api/issues/56
opencap-core is currently running on local workstations and one EC2 instance on aws. In the long run, we might want to move everything to aws (or another cluster) and optimize usage based on demand. Indeed, workers are currently always active waiting for jobs. Ideally, workers would be activated when there is demand and the number of workers would automatically scale as a function of demand. It is likely that we can move the entire infrastructure to aws at the same costs.
Random thoughts:
aws ec2 start-instances --instance-ids <instance_id>
main
stopped
to implement the logicCopying from core:
Scaling rule:
if nInstances == 0 and len(queue) > 0 ---> nInstances = 1
if len(queue)) > 5 and nInstances >1 ---> nInstances += 1
-evaluate scaling rule in API every so often (cron job?) -Increase number of instances if needed, can use boto3 cmds -to scale down, could set a "haven't pulled a trial in xx min" in app.py, and shut down/stop an instance. should be able to run this in the docker?
Notes from call with AWS:
For the architecture we reviewed together I’d recommend decoupling the container from the queue solution:
Continue deploying container to ECR – remove the long-live part. Just receive one item from queue as input, process it and let the process die.
Create a ECS cluster using EC2 and Autoscaling group. Indeed, Fargate doesn’t currently support GPU but there is a roadmap item on it. Read more on cluster capacity management here for auto scaling group capacity providers.
Create a Task Definition for your container processing task.
Create a queue consumer that launches an ECS task to process one queue item. You’ll use the run standalone task for that.
Few resources to help you get started:
Deprecated. Moved here: https://github.com/stanfordnmbl/opencap-api/issues/174
We'd like to have surge GPU capacity using AWS auto-scaling. We will have base capacity that is always running, so this will only activate if the queue is a certain length.
desired_asg_gpu_instances
that will get updated by the celery queue check and checked by the auto-scaling rule. @sashasimkindesired_asg_gpu_instances
on cloudwatch: https://github.com/stanfordnmbl/opencap-api/issues/173 @olehkorkh-planeksdesired_asg_gpu_instances
from cloudwatch and spins up/down machines. Spun up machines should have scale-in protection.@olehkorkh-planeks @sashasimkin @antoinefalisse please read over and update this. We can ignore the two