Closed bigclouds closed 2 years ago
/cc @william-wang @tizhou86 Please help take a review. Thanks!
Yes. Factor is the least allocation unit. Before this patch impliedly the unit is 1M. pod's request represents how many of this unit. pod is told about unit through ENV GPUFactor.
I think factor conver of running pod (task) through annotations of pod and node such as
pod.annotations.volcano.sh/gpu-factor
default is 1Mnode.annotations.volcano.sh/gpu-factor
is setting by device plugin, such as 256Mvolcano.sh/gpu-memory
* (pod.annotations.volcano.sh/gpu-factor
/ node.annotations.volcano.sh/gpu-factor
)But if change factor, virtual devices id of kubelet have been allocate by running pod do not be free, only when pods deleted
LGTM overall.
one nit: Let's take 100
as default factor and add doc for that, do you have time to modify that? @bigclouds
Kubelet reports "grpc: received message larger than max". The size of data returned from ListAndWatch is biggeer than 4M. Anyway it needs a strategic way to report device information. This patch adds a division factor(divisor), by which device amount divides.
Signed-off-by: longguang.yue yuelg@chinaunicom.cn