volcano-sh / devices

Device plugins for Volcano, e.g. GPU
Apache License 2.0
97 stars 41 forks source link

Add a strategy to report and allocate device #22

Closed bigclouds closed 2 years ago

bigclouds commented 2 years ago

Kubelet reports "grpc: received message larger than max". The size of data returned from ListAndWatch is biggeer than 4M. Anyway it needs a strategic way to report device information. This patch adds a division factor(divisor), by which device amount divides.

Signed-off-by: longguang.yue yuelg@chinaunicom.cn

Thor-wl commented 2 years ago

/cc @william-wang @tizhou86 Please help take a review. Thanks!

bigclouds commented 2 years ago

Yes. Factor is the least allocation unit. Before this patch impliedly the unit is 1M. pod's request represents how many of this unit. pod is told about unit through ENV GPUFactor.

WingkaiHo commented 2 years ago

I think factor conver of running pod (task) through annotations of pod and node such as

But if change factor, virtual devices id of kubelet have been allocate by running pod do not be free, only when pods deleted

shinytang6 commented 2 years ago

LGTM overall. one nit: Let's take 100 as default factor and add doc for that, do you have time to modify that? @bigclouds