Open tkatila opened 1 year ago
K8s supports limit ranges: https://kubernetes.io/docs/concepts/policy/limit-range/
@uniemimu thought that it should support extended resources in addition to core ones, but somebody needs to check (also) whether it actually works when one omits default and min values (i.e. whether it allows pods that do not request a GPU):
apiVersion: v1
kind: LimitRange
metadata:
name: gpu-count-constraint
spec:
limits:
- max:
gpu.intel.com/i915: 1
type: Container
@uMartinXu can you try whether that does what you wanted?
Hi @eero-t, tried limitrange. supports extended resources.
It does not accept pods with gpu.intel.com/i915 > 1
.
Warning FailedCreate 2m55s job-controller Error creating: pods "intel-dgpu-clinfo-gphs2" is forbidden: maximum gpu.intel.com/i915 usage per Container is 1, but limit is 3
In case of a pods that do not request gpu resource- whether scheduled on GPU node or others, the pod does allocate i915 resource requests and limits:
LimitRanger plugin set: gpu.intel.com/i915 request for container httpd;
gpu.intel.com/i915 limit for container httpd
containers:
- resources:
limits:
gpu.intel.com/i915: '1'
requests:
gpu.intel.com/i915: '1'
Thanks for testing, good to hear that (max) limiting part works!
Will it still try to add GPU resource requests for non-GPU pods, if you add to LimitRange:
min:
gpu.intel.com/i915: 0
?
Yes, added min limit. It still adds request and limit i915 resource to non-GPU pods.
I understand its cause of the default limit and default request, is that right? Irrespective of what default value is set, it does add that to non GPU pods.
default:
gpu.intel.com/i915: 1
defaultRequest:
gpu.intel.com/i915: 1
Do you mean that even if you specify gpu.intel.com/i915: 0
default, non-GPU pods get gpu.intel.com/i915: 1
?
That sounds like bug which should be filed to upstream kubernetes. Either as "LimitRange does not respect specified default value", or "LimitRange adds resource request even when default and min resource requests are specified as zero".
If default values are added to limitrange it is added to non GPU pods. If no default values are added to limit range, it allocates below values to non GPU pods. Is this expected?
- resources:
limits:
gpu.intel.com/i915: '1'
requests:
gpu.intel.com/i915: '1'
If default values are added to limitrange it is added to non GPU pods.
so if you specify default as gpu.intel.com/i915: 0
, is the resource request zero for non-GPU pods?
Yes works as expected. If default isgpu.intel.com/i915: 0
it shows the same for non GPU pods
With max and min in limitrange, for GPU pods requesting resource > 1, it has a forbidden error. If default values are not added in limirange, it ends up adding max values as limit and request to non GPU pods as well. After adding default limit and request both 0- for non GPU pods, it just shows up as gpu.intel.com/i915: 0 in the spec.
After adding default limit and request both 0- for non GPU pods, it just shows up as
gpu.intel.com/i915: 0
in the spec.
@uniemimu You've been looking more into scheduler. Do you see any practical problem with zero extended resource request being added to non-GPU processes 8by limitRange
)?
I.e. does it just look funny, but work fine in practice?
Also, as limitrange is namespace scoped- we only add it in the namespace where workloads are deployed right?
Yes, to namespaces where your cluster config RBAC rules allow given k8s users access to GPU resources.
Got it, thanks.
I.e. does it just look funny, but work fine in practice?
Apart from this, can we assume limitrange as an efficient solution? Then we could add the yaml in the project 1.0.0 GA as deployment step after GPU deviceplugin is created- for publishing certified operator on OCP 4.12.
I'm also assuming this would be for operator configuration option for whether multi-GPU jobs are allowed.
But does operator component know which namespaces in given cluster are allowed to access GPU resources? Aren't such RBAC rules rather specific to cluster?
For openshift, there might be some roles to not deploy workloads/objects in "openshift-"
namespaces created during the cluster. Will check
On openshift these namespaces: default, kube-system, kube-public, openshift-node, openshift-infra, openshift
do not allow assigning SCC, so its recommended not deploy pods in these namespaces
Any other possible solutions you would suggest to try apart from limitrange?
While writing separate Webhook for this is a possibility, they are nasty, and LimitRange already seems to be explicitly designed for this. It just needs to support zero minimum value better (not add request for zero resource).
Or do you think it should have also an option for limit being cluster wide instead of namespace specific?
Thanks, agree with LimitRange. Workloads can be deployed to a specific namespace for individual GPU access. @uMartinXu any thoughts? For supporting 0 minimum value better, along with LimitRange can ResourceQuota be a good option? If it does, it cant accept pods which don't have any gpu.intel.com/i915 request/limit in their pod spec. So the namespace should only accept GPU pods. Not sure if it supports extended resources though.
The limitation of the i915 to 1 should be enforced on the whole cluster scope and should not only be applied to the specific namespace.
There are few alternatives to achieve whole cluster wide GPU count limits:
@uniemimu, @tkatila Any comments on these?
- Add option to GAS for specifying allowed range for GPU resource requests
GAS would be a possible place for limiting the i915 resource requests, but that would then require using GAS in general.
- Add option to GPU plugin for rejecting GPU (count) requests outside of given range
I doubt that this is an option as it's quite late in the Pod scheduling flow. I tried returning an error for the Allocate()
and scheduler just kept retrying leaving multiple UnexpectedAdmissionError
pods behind. Though the documentation indicates that it's possible somehow more peacefully to return an error.
- Improve LimitRange to have option for applying limit to all namespaces
According to Stackoverflow, this can already be done by using Kyverno: https://stackoverflow.com/questions/73488971/how-can-i-apply-limit-range-to-all-namespaces-in-kubernetes
Whis is "a policy engine to validate, mutate, generate, and cleanup Kubernetes resources, and verify image signatures and artifacts to help secure the software supply chain".
(LimitRange should really be first improved to not to add zero resource requests to every pod though.)
I just noticed that ResourceQuota
supports also external resources: https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-for-extended-resources
That could also be experimented with, whether it works any better than LimitRange
for limiting GPU usage to only specific namespaces.
Related are the old backlog items #598 and #486
In #1377 it was identified that it would be required in some clusters to limit the Pod's i915 resource count to 1 (or some other value). The idea is to allow setting shared-dev-num to >1 and to prevent users from accessing more GPU resource than designed.
Webhook might be a good way to implement this, but it would be good to study other solutions as well.