Closed relyt0925 closed 2 years ago
Key observation from post above: The key observation is the amount of machine-config-server pods spun up after one restart. I would expect only 3 pods to be spun up (one for each ignition server however if you trace the lifecyle of the pods there are actually 6 over the life span of a restart)
machine-config-server-1236201992 0/1 Init:0/3 0 3s
machine-config-server-291020420 0/1 Init:1/3 0 4s
machine-config-server-446334720 1/1 Running 0 8s
machine-config-server-1463997948 0/1 Init:0/3 0 2s
machine-config-server-1408474896 0/1 Pending 0 0s
machine-config-server-1018368211 0/1 Init:2/3 0 5s
For the leak: I think the delete is best effort and if it were to fail the resource will be permanently orphaned. May need a separate time based cleaner that ensures anything older than 20 minutes or whatever threshold is also cleaned: https://github.com/openshift/hypershift/blob/main/ignition-server/controllers/machineconfigserver_ignitionprovider.go#L80-L98
Key observation from post above: The key observation is the amount of machine-config-server pods spun up after one restart. I would expect only 3 pods to be spun up (one for each ignition server however if you trace the lifecyle of the pods there are actually 6 over the life span of a restart)
The expectation depends on the number of token secrets and the number of ignition server pods you have right? How may token Secret and ignition server pods are there in that clusters? Do you have the logs by chance?
I don't see this being a problem after we have fixed the nodePool reconciliation in this pr for IBMCloud clusters: https://github.com/openshift/hypershift/commit/1df13b9cde220c50c80289f6f02cd693013926c6
Things are looking better after the reconciliation fix I will close and reopen if I notice more problems
Replication pattern: Delete all ignition-server pods at once with
kubectl delete pods -n NAMESPACE -l app=ignition-server