kubernetes-retired / contrib

[EOL] This is a place for various components in the Kubernetes ecosystem that aren't part of the Kubernetes core.
Apache License 2.0
2.46k stars 1.68k forks source link

Rescheduler considers pods critical only when they are in kube-system namespace #2922

Closed bsalamat closed 6 years ago

bsalamat commented 6 years ago

Rescheduler has always considered pods only in kube-system namespace as critical pods. Recently, we made a change to consider pods with highest priority levels as critical as well. This PR ensures that such pods are considered as critical when they have such high priorities AND they are in kube-system namespace. This could help reduce possibility of malfunction in clusters under resource pressure where regular users have created pods with critical priorities in non-system namespaces.

davidopp commented 6 years ago

/lgtm /approve

bsalamat commented 6 years ago

@davidopp I had to run gofmt on the files. Could you please re-lgtm?

davidopp commented 6 years ago

/lgtm

k8s-ci-robot commented 6 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, davidopp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[rescheduler/OWNERS](https://github.com/kubernetes/contrib/blob/master/rescheduler/OWNERS)~~ [bsalamat] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ravisantoshgudimetla commented 6 years ago

@bsalamat Is this only to ensure backwards compatibility with old version of rescheduler?

davidopp commented 6 years ago

I think backward compatibility is a reasonable justification.

The other reason is that we want rescheduler to be able to evict high-priority user pods to get critical system pods to schedule, otherwise users can block critical pods from scheduling by filling up the cluster with high-priority pods. (Priority-based quota is in alpha in 1.11 so cluster admins who do not use alpha features will be unable to prevent users from filling up the cluster with high-priority pods.) If rescheduler considers high-priority user pods to be critical, then it won't evict them. Here a "user pod" is defined as one outside the kube-system namespace.

ravisantoshgudimetla commented 6 years ago

That makes sense @davidopp. Thanks.

But I think, the current version of rescheduler can only evict pods created by DS controller, so high-priority pods created by user directly or some other controller won't be evicted by rescheduler even in that case. So, IIUC priority-based quota is still needed.

davidopp commented 6 years ago

I haven't looked at the rescheduler code recently, but I don't think it's true that "the current version of rescheduler can only evict pods created by DS controller." The point of the rescheduler is to evict user pods to allow critical system pods to schedule.

That said, it is true that this change doesn't eliminate all the vulnerabilities. But it reduces them, by allowing the rescheduler to evict user pods to schedule daemon sets.

@bsalamat please correct me if I'm wrong.

ravisantoshgudimetla commented 6 years ago

But I think, the current version of rescheduler can only evict pods created by DS controller

Ohh sorry, what I meant was rescheduler will be run only when there is no space for critical pods created by DS controller. If there are pods created by other controllers or users create system critical or node critical pods, scheduler or rescheduler cannot do anything because they cannot evict high critical pods.

But I understand your argument of reducing the abuse. Thanks