Open betatim opened 5 years ago
@betatim a deployments pod can be moved, but critieras need to be met:
Do not worry about any pods from a daemonset though, and i think the prometheus node exporter is such pod btw, they will be ignored when this is considered.
So, I'd verify that this node with this node:
For more documentation, see:
(Some traces of my issues with this earlier is available in: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/503 - it wont be a very focused read though)
"Pods that are not backed by a controller object (so not created by deployment, replica set, job, stateful set etc)."
I read that as "if there is no PDB for this it will not hold up scale-down". Is this also how you see it?
The node is definitely underutilized (8 cores and 52GB should be far too much for those pods :) ) and the second of the three nodes in the cluster was also mostly empty. Only matomo and event-archiver would actually need relocating as all other of the pods in the prod
namespace are from daemonsets.
Overall it would be nice to find a way to have the AC tell you why it thinks it can/can't do something. Mostly because it is tedious to look at all the PDBs and heritage of the running pods and because I might misread a PDB. After which we are back to "Tim thinks this node should be removed but the AC doesn't" :-/
Yepp! GKE access to the Cluster autoscalers logs isnt something we can get, at least not some months ago, on GKE.
Id look for PDBs for those two pods, perhaps u have one saying it is one required at all times, I dont know what would happen without any PDB but only a deployment running a single pod, would it move? Would it move if there were five pods in the deployment? What is the default behaviour for distuptions is what im asking, hmmm, / erik from mobile
It seems a bit extreme that the CA would evict the only pod of a deployment in prder to scale down, so i assume what you cited is one criteria rather than the only ceiteria.
oh regarding thr question, i see the quoted text as: pods not controlled by a deployment etc will always block scale down, unless a PDB makes an exception. Our JH user pods are such pods, spawned by kubespawner rather than a deployment etc.
try without making a big chart change, to simply add a PDB with kubectl, they are quite simple objects
Question: is there a way to see what the CA is thinking in order to find out what is preventing a down-scale? The generic advice is to look at the CA logs on the master node, however on GKE we can't access those and my googling hasn't brought up an alternative.
Over the last few days/weeks/times I've looked it never seems to go below three nodes, even if one of them is essentially empty. I had a moment to poke around today and left feeling like I don't understand why the third node doesn't get removed.
The node I think should be removed from the pool is `` and pods on it are:
None of which should prevent the node from being removed because they are all controlled by a Deployment or similar. I didn't dig into the kube-system NS pods as they look like pods that would be present on all nodes and in general cluster autoscaling (CA) works.
I know how to look at the current status of CA with:
and that shows it has been checking recently to decide if it needs to scale up/down.