Closed duallain closed 4 years ago
Bedrock prod is by far the largest user of resources in the cluster. So, we should likely fit the cpu/memory ratios of the cluster to the pods.
Current bedrock prod pod limits are:
limits:
cpu: 1500m
memory: 1000Mi
(A snapshot of a nodes usage right now, april 30th ~12:30 pm pacific) (used| available | used | available) 2.54 CPU | 3.92 CPU | 2.48 GB | 13.97 GB |
This is our total cluster availability 64 vCPUs | 256.00 GB @ 16 nodes
Based on Haswell being the default platform at the moment in us-central per https://cloud.google.com/compute/docs/regions-zones and this page: https://cloud.google.com/compute/docs/cpu-platforms listing haswell at 2.3Ghz (2300 mhz).
Based on all that, a node with 4 vcpus has 9200 mhz available, and 16 gb = 16384 mb. We are allocating nodes with approx twice as much memory as cpu cycles. But, our pod limts are requesting more cpu than memory for bedrock. To save money we can change our memory from 16gb to 8gb to create a more balanced cluster.
Are we autoscaling nodes? Are they the right composition of memory/cpu? Is the bedrock deployment scaled appropriately?