gpu: study and implement a way to limit Pod's i915 count

tkatila commented 1 year ago

In #1377 it was identified that it would be required in some clusters to limit the Pod's i915 resource count to 1 (or some other value). The idea is to allow setting shared-dev-num to >1 and to prevent users from accessing more GPU resource than designed.

Webhook might be a good way to implement this, but it would be good to study other solutions as well.

eero-t commented 1 year ago

K8s supports limit ranges: https://kubernetes.io/docs/concepts/policy/limit-range/

@uniemimu thought that it should support extended resources in addition to core ones, but somebody needs to check (also) whether it actually works when one omits default and min values (i.e. whether it allows pods that do not request a GPU):

apiVersion: v1
kind: LimitRange
metadata:
  name: gpu-count-constraint
spec:
  limits:
  - max:
      gpu.intel.com/i915: 1
    type: Container

@uMartinXu can you try whether that does what you wanted?

vbedida79 commented 1 year ago

Hi @eero-t, tried limitrange. supports extended resources. It does not accept pods with gpu.intel.com/i915 > 1.
Warning FailedCreate 2m55s job-controller Error creating: pods "intel-dgpu-clinfo-gphs2" is forbidden: maximum gpu.intel.com/i915 usage per Container is 1, but limit is 3 In case of a pods that do not request gpu resource- whether scheduled on GPU node or others, the pod does allocate i915 resource requests and limits:

LimitRanger plugin set: gpu.intel.com/i915 request for container httpd;
      gpu.intel.com/i915 limit for container httpd

containers:
    - resources:
        limits:
          gpu.intel.com/i915: '1'
        requests:
          gpu.intel.com/i915: '1'

eero-t commented 1 year ago

Thanks for testing, good to hear that (max) limiting part works!

Will it still try to add GPU resource requests for non-GPU pods, if you add to LimitRange:

min:
      gpu.intel.com/i915: 0

?

vbedida79 commented 1 year ago

Yes, added min limit. It still adds request and limit i915 resource to non-GPU pods.

I understand its cause of the default limit and default request, is that right? Irrespective of what default value is set, it does add that to non GPU pods.

    default: 
      gpu.intel.com/i915: 1
    defaultRequest:
      gpu.intel.com/i915: 1

eero-t commented 1 year ago

Do you mean that even if you specify gpu.intel.com/i915: 0 default, non-GPU pods get gpu.intel.com/i915: 1?

That sounds like bug which should be filed to upstream kubernetes. Either as "LimitRange does not respect specified default value", or "LimitRange adds resource request even when default and min resource requests are specified as zero".

vbedida79 commented 1 year ago

If default values are added to limitrange it is added to non GPU pods. If no default values are added to limit range, it allocates below values to non GPU pods. Is this expected?

- resources:
        limits:
          gpu.intel.com/i915: '1'
        requests:
          gpu.intel.com/i915: '1'

eero-t commented 1 year ago

If default values are added to limitrange it is added to non GPU pods.

so if you specify default as gpu.intel.com/i915: 0, is the resource request zero for non-GPU pods?

vbedida79 commented 1 year ago

Yes works as expected. If default isgpu.intel.com/i915: 0 it shows the same for non GPU pods

vbedida79 commented 1 year ago

With max and min in limitrange, for GPU pods requesting resource > 1, it has a forbidden error. If default values are not added in limirange, it ends up adding max values as limit and request to non GPU pods as well. After adding default limit and request both 0- for non GPU pods, it just shows up as gpu.intel.com/i915: 0 in the spec.

eero-t commented 1 year ago

After adding default limit and request both 0- for non GPU pods, it just shows up as gpu.intel.com/i915: 0 in the spec.

@uniemimu You've been looking more into scheduler. Do you see any practical problem with zero extended resource request being added to non-GPU processes 8by limitRange)?

I.e. does it just look funny, but work fine in practice?

vbedida79 commented 1 year ago

Also, as limitrange is namespace scoped- we only add it in the namespace where workloads are deployed right?

eero-t commented 1 year ago

Yes, to namespaces where your cluster config RBAC rules allow given k8s users access to GPU resources.

vbedida79 commented 1 year ago

Got it, thanks.

I.e. does it just look funny, but work fine in practice?

Apart from this, can we assume limitrange as an efficient solution? Then we could add the yaml in the project 1.0.0 GA as deployment step after GPU deviceplugin is created- for publishing certified operator on OCP 4.12.

eero-t commented 1 year ago

I'm also assuming this would be for operator configuration option for whether multi-GPU jobs are allowed.

But does operator component know which namespaces in given cluster are allowed to access GPU resources? Aren't such RBAC rules rather specific to cluster?

vbedida79 commented 1 year ago

For openshift, there might be some roles to not deploy workloads/objects in "openshift-" namespaces created during the cluster. Will check

vbedida79 commented 1 year ago

On openshift these namespaces: default, kube-system, kube-public, openshift-node, openshift-infra, openshift do not allow assigning SCC, so its recommended not deploy pods in these namespaces

vbedida79 commented 1 year ago

Any other possible solutions you would suggest to try apart from limitrange?

eero-t commented 1 year ago

While writing separate Webhook for this is a possibility, they are nasty, and LimitRange already seems to be explicitly designed for this. It just needs to support zero minimum value better (not add request for zero resource).

Or do you think it should have also an option for limit being cluster wide instead of namespace specific?

vbedida79 commented 1 year ago

Thanks, agree with LimitRange. Workloads can be deployed to a specific namespace for individual GPU access. @uMartinXu any thoughts? For supporting 0 minimum value better, along with LimitRange can ResourceQuota be a good option? If it does, it cant accept pods which don't have any gpu.intel.com/i915 request/limit in their pod spec. So the namespace should only accept GPU pods. Not sure if it supports extended resources though.

uMartinXu commented 1 year ago

The limitation of the i915 to 1 should be enforced on the whole cluster scope and should not only be applied to the specific namespace.

eero-t commented 1 year ago

There are few alternatives to achieve whole cluster wide GPU count limits:

Allow GPU resources only for namespaces that have suitable LimitRange limits in place, using RBAC rules: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
Improve LimitRange to have option for applying limit to all namespaces
Add option to GAS for specifying allowed range for GPU resource requests
Add option to GPU plugin for rejecting GPU (count) requests outside of given range

@uniemimu, @tkatila Any comments on these?

tkatila commented 1 year ago

Add option to GAS for specifying allowed range for GPU resource requests

GAS would be a possible place for limiting the i915 resource requests, but that would then require using GAS in general.

Add option to GPU plugin for rejecting GPU (count) requests outside of given range

I doubt that this is an option as it's quite late in the Pod scheduling flow. I tried returning an error for the Allocate() and scheduler just kept retrying leaving multiple UnexpectedAdmissionError pods behind. Though the documentation indicates that it's possible somehow more peacefully to return an error.

eero-t commented 1 year ago

Improve LimitRange to have option for applying limit to all namespaces

According to Stackoverflow, this can already be done by using Kyverno: https://stackoverflow.com/questions/73488971/how-can-i-apply-limit-range-to-all-namespaces-in-kubernetes

Whis is "a policy engine to validate, mutate, generate, and cleanup Kubernetes resources, and verify image signatures and artifacts to help secure the software supply chain".

(LimitRange should really be first improved to not to add zero resource requests to every pod though.)

eero-t commented 9 months ago

I just noticed that ResourceQuota supports also external resources: https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-for-extended-resources

That could also be experimented with, whether it works any better than LimitRange for limiting GPU usage to only specific namespaces.

mythi commented 9 months ago

Related are the old backlog items #598 and #486

intel / intel-device-plugins-for-kubernetes

gpu: study and implement a way to limit Pod's i915 count #1408