Open gfr10598 opened 3 years ago
Today I am: changing sandbox default pool to allow 1 node per zone, and remove zone us-east1-d changing staging 8 core parser-pool1 to allow 0-2 nodes per zone. changing staging 4 core parser-pool to allow 0-1 nodes per zone.
Tomorrow, I intend to change prod to set up auto-scaling for parser, default, and gardener pools.
removing zone us-east1-d from the default pool in mlab-sandbox made gardener unschedulable, because of persistent volume location. Restored us-east1-d, and adjusted auto-scaling to allow 1-2 per zone.
Later changed to 0-2 per zone
Subsequently updated mlab-sandbox parser-pool1 to allow 0-2 nodes per zone as well.
@gfr10598 can you enumerate the steps that are needed before the next gardener release to production?
I just deleted the mlab-staging parser-pool1 node pool from data-processing cluster. K8S had restarted parsers, and they were instantiated in the wrong node pool. Deleting the pool should prevent this happening again.
See https://github.com/m-lab/etl/issues/985 related to propagating errors from etl to gardener.
The most recent change to etl k8s configs failed to launch pods in staging, because I had set the 8 core node-pool to 0 instances.
Also, in all projects, the utilization is low, because there are more nodes than needed for the requested pods.
All the pools should be updated to use appropriate auto-scaling configs.