m-lab / etl-gardener

Gardener provides services for maintaining and reprocessing mlab data.
Apache License 2.0
13 stars 5 forks source link

Adjust node pools in data-processing clusters #330

Open gfr10598 opened 3 years ago

gfr10598 commented 3 years ago

The most recent change to etl k8s configs failed to launch pods in staging, because I had set the 8 core node-pool to 0 instances.

Also, in all projects, the utilization is low, because there are more nodes than needed for the requested pods.

All the pools should be updated to use appropriate auto-scaling configs.

gfr10598 commented 3 years ago

Today I am: changing sandbox default pool to allow 1 node per zone, and remove zone us-east1-d changing staging 8 core parser-pool1 to allow 0-2 nodes per zone. changing staging 4 core parser-pool to allow 0-1 nodes per zone.

Tomorrow, I intend to change prod to set up auto-scaling for parser, default, and gardener pools.

gfr10598 commented 3 years ago

removing zone us-east1-d from the default pool in mlab-sandbox made gardener unschedulable, because of persistent volume location. Restored us-east1-d, and adjusted auto-scaling to allow 1-2 per zone.

Later changed to 0-2 per zone

gfr10598 commented 3 years ago

Subsequently updated mlab-sandbox parser-pool1 to allow 0-2 nodes per zone as well.

laiyi-ohlsen commented 3 years ago

@gfr10598 can you enumerate the steps that are needed before the next gardener release to production?

gfr10598 commented 3 years ago

I just deleted the mlab-staging parser-pool1 node pool from data-processing cluster. K8S had restarted parsers, and they were instantiated in the wrong node pool. Deleting the pool should prevent this happening again.

gfr10598 commented 3 years ago

See https://github.com/m-lab/etl/issues/985 related to propagating errors from etl to gardener.