Closed komljen closed 6 years ago
Hi @komljen thanks for raising this issue.
I think you're right about the metrics being wrong, I believe changing the first boolean to true on this line will fix the metrics updating https://github.com/pusher/k8s-spot-rescheduler/blob/d93c65cb68d2a2c9e2f0d8f374bc402140c3dabe/rescheduler.go#L381
Since we aren't actually using this group of pods for deletion we can just retrieve all of the pods here for the metrics.
E0524 07:01:08.356919 1 rescheduler.go:256] Failed to get pods for consideration: monitoring/mon-exporter-node-gxxph is not replicated E0524 07:01:08.356932 1 rescheduler.go:256] Failed to get pods for consideration: pr-333/es-data-st-cluster-eu-west-1a-0 is not replicated
Are you also suggesting that the Spot rescheduler should be able to delete non-replicated pods? Is this a behaviour you would want?
Thanks for quick reply! I will try to make this change and let you know.
Are you also suggesting that the Spot rescheduler should be able to delete non-replicated pods? Is this a behaviour you would want?
I think it is ok to leave it like this, or if possible to make it configurable.
Ok, metrics are good now, but mon-exporter-node-gxxph
pod is actually a daemonset, so it shouldn't be reported as not replicated?
Another one, I have one spot and one on-demand instance, kube-dns
is deployed on both of them, but I get this in logs:
E0601 14:15:02.256462 1 rescheduler.go:383] Failed to update metrics on spot node ip-10-2-3-219.eu-west-1.compute.internal: kube-system/kube-dns-7785f4d7dc-qhn7p is not replicated
E0601 14:15:02.256487 1 rescheduler.go:256] Failed to get pods for consideration: kube-system/kube-dns-7785f4d7dc-vmjqw is not replicated
I'm having a same issue.
log says kube-dns
and my other pods managed by deployments(from helm charts) are not replicated.
I would be happy if you make it configurable to delete all pods.
We could probably close this issue as now we have the ability to move non replicated pods?
Agreed, thanks @komljen
Rescheduler fails to update metrics on spot nodes because there are 2 non replicated pods running on them. In logs I have this, and it keeps repeating:
At this point rescheduler is pretty much useless.