mozmeao / infra

Mozilla Marketing Engineering and Operations Infrastructure
https://mozilla.github.io/meao/
Mozilla Public License 2.0
59 stars 12 forks source link

Investigate best way to do scaling of bedrock pods #1298

Open duallain opened 4 years ago

duallain commented 4 years ago

It seems like cpu/memory may not be the best signal to know when to spin up new bedrock instances. If we choose a metric that is better fitted to bedrock's capabilities we can likely deploy fewer pods, and need less k8s infrastructure resources. We should investigate a few items:

  1. How do we expose additional metrics to HPA objects. (There are quite a few choices here, what's best?)
  2. What metric is most reflective of load on bedrock (this implies some perf testing)
  3. (generate stories for) deploying new metrics for HPA
  4. (generate stories for) deploying bedrock prod/stg/dev with the new HPA metrics
  5. Test the hpa works, ensure latency between origin/cdn doesn't change.
bookshelfdave commented 4 years ago

see also: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler

duallain commented 4 years ago

https://github.com/FairwindsOps/goldilocks can maybe help with tuning the VPA. It seems like you basically install goldilocks + vpa and then get changes recommended. Which is rad as all get out.

bookshelfdave commented 4 years ago

let's give Goldilocks a shot!