Closed bsalamat closed 6 years ago
/lgtm /approve
@davidopp I had to run gofmt on the files. Could you please re-lgtm?
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: bsalamat, davidopp
The full list of commands accepted by this bot can be found here.
The pull request process is described here
@bsalamat Is this only to ensure backwards compatibility with old version of rescheduler?
I think backward compatibility is a reasonable justification.
The other reason is that we want rescheduler to be able to evict high-priority user pods to get critical system pods to schedule, otherwise users can block critical pods from scheduling by filling up the cluster with high-priority pods. (Priority-based quota is in alpha in 1.11 so cluster admins who do not use alpha features will be unable to prevent users from filling up the cluster with high-priority pods.) If rescheduler considers high-priority user pods to be critical, then it won't evict them. Here a "user pod" is defined as one outside the kube-system
namespace.
That makes sense @davidopp. Thanks.
But I think, the current version of rescheduler can only evict pods created by DS controller, so high-priority pods created by user directly or some other controller won't be evicted by rescheduler even in that case. So, IIUC priority-based quota is still needed.
I haven't looked at the rescheduler code recently, but I don't think it's true that "the current version of rescheduler can only evict pods created by DS controller." The point of the rescheduler is to evict user pods to allow critical system pods to schedule.
That said, it is true that this change doesn't eliminate all the vulnerabilities. But it reduces them, by allowing the rescheduler to evict user pods to schedule daemon sets.
@bsalamat please correct me if I'm wrong.
But I think, the current version of rescheduler can only evict pods created by DS controller
Ohh sorry, what I meant was rescheduler will be run only when there is no space for critical pods created by DS controller. If there are pods created by other controllers or users create system critical or node critical pods, scheduler or rescheduler cannot do anything because they cannot evict high critical pods.
But I understand your argument of reducing the abuse. Thanks
Rescheduler has always considered pods only in kube-system namespace as critical pods. Recently, we made a change to consider pods with highest priority levels as critical as well. This PR ensures that such pods are considered as critical when they have such high priorities AND they are in kube-system namespace. This could help reduce possibility of malfunction in clusters under resource pressure where regular users have created pods with critical priorities in non-system namespaces.