GCP Cost Optimization - Githubissues

Bedrock prod is by far the largest user of resources in the cluster. So, we should likely fit the cpu/memory ratios of the cluster to the pods.

Current bedrock prod pod limits are:

limits:
  cpu: 1500m
  memory: 1000Mi

(A snapshot of a nodes usage right now, april 30th ~12:30 pm pacific) (used| available | used | available) 2.54 CPU | 3.92 CPU | 2.48 GB | 13.97 GB |

This is our total cluster availability 64 vCPUs | 256.00 GB @ 16 nodes

Based on Haswell being the default platform at the moment in us-central per https://cloud.google.com/compute/docs/regions-zones and this page: https://cloud.google.com/compute/docs/cpu-platforms listing haswell at 2.3Ghz (2300 mhz).

Based on all that, a node with 4 vcpus has 9200 mhz available, and 16 gb = 16384 mb. We are allocating nodes with approx twice as much memory as cpu cycles. But, our pod limts are requesting more cpu than memory for bedrock. To save money we can change our memory from 16gb to 8gb to create a more balanced cluster.

mozmeao / infra

GCP Cost Optimization #1292