kingfischer16 / mldeploy

Deploy ML code to cloud resources as a REST API for inference and training.
MIT License
2 stars 0 forks source link

add hibernate function #39

Open kingfischer16 opened 3 years ago

kingfischer16 commented 3 years ago

A Fargate cluster will automatically hibernate when there are no more jobs, but an EC2 cluster will not. A hibernate function should be callable on an active deployment, and should trigger:

  1. EC2: ASG desired instances set to zero.
  2. Fargate: If jobs are still running, set desired tasks to zero in service. This should be effective regardless of the SQS still having jobs or not.

Additionally, a resume or thaw or wakeup function is required to restart the execution again. This is functionality will likely come in handy so users don't have to undeploy and redeploy the same deployment everytime they need to take a break.

kingfischer16 commented 3 years ago

The ASG size should match 1:1 the SQS queue size, up to the maximum allowed number of instances. This can be setup using a CloudWatch alarm.

If CloudWatch alarms are setup, this may interfere with the hibernate function. Does the hibernate function need to disable these CloudWatch alarms?