ECS Kubernetes is typically ~50% of the cost of a low-usage Cellenics deployment (deployment without minimum of 1 user at all times). In staging environments, we avoid this cost by running pipeline jobs in batch (pay by usage) and keeping worker pod replicas=0. This creates an annoyance where we have to manually scale worker pods to replicas=1 in order to manually test a staged environment. It additionally means that developers can only test a staged PR in a deployment that they have access to (e.g. only HMS developers can check HMS staging environment and Biomage must re-stage). If this scaling is automated, both of the above frictions will be eliminated. It would also create the opportunity to eliminate idle worker/pipeline pods and the associated costs for low-usage production environments.
Proposal
make api auto scale worker pods when replicas=0 to replicas=1
make worker scaledown cron job deployment configurable (currently defaults to on for staging only)
make worker timeout deployment configurable (annoying if worker dies after 10 mins if have to wait for worker pod to spin up)
Background
ECS Kubernetes is typically ~50% of the cost of a low-usage Cellenics deployment (deployment without minimum of 1 user at all times). In staging environments, we avoid this cost by running pipeline jobs in batch (pay by usage) and keeping worker pod
replicas=0
. This creates an annoyance where we have to manually scale worker pods toreplicas=1
in order to manually test a staged environment. It additionally means that developers can only test a staged PR in a deployment that they have access to (e.g. only HMS developers can check HMS staging environment and Biomage must re-stage). If this scaling is automated, both of the above frictions will be eliminated. It would also create the opportunity to eliminate idle worker/pipeline pods and the associated costs for low-usage production environments.Proposal
replicas=0
toreplicas=1
PRs
api
auto-scale PR: https://github.com/hms-dbmi-cellenics/api/pull/491worker
scaledown cron configurable: https://github.com/hms-dbmi-cellenics/worker/pull/347iac
permissions for api and adds configuration: https://github.com/hms-dbmi-cellenics/iac/pull/566