Open Trainbow opened 1 year ago
你好,我在尝试volcano gpu number的服务调度,在根据volcano的教程步骤安装之后,每一个带gpu的node都能够正确的显示有多少块gpu,但是在创建pod的时候,container的容器中没有volcano-gpu-number这一个环境变量,在里面输入nvidia-smi能够看到该节点所有的gpu,想问一下是否需要更改yaml文件?
Hey, which version do you make use of?
你好,我在尝试volcano gpu number的服务调度,在根据volcano的教程步骤安装之后,每一个带gpu的node都能够正确的显示有多少块gpu,但是在创建pod的时候,container的容器中没有volcano-gpu-number这一个环境变量,在里面输入nvidia-smi能够看到该节点所有的gpu,想问一下是否需要更改yaml文件?
Hey, which version do you make use of?
volcano-1.6.0
/cc @wangyang0616 Can you help take a look?
/cc @wangyang0616 Can you help take a look?
ok, let me take a look
@Trainbow Is it convenient to post the yaml file for creating the test task? By the way, can it be successfully scheduled using the default scheduler of k8s?
@Trainbow Is it convenient to post the yaml file for creating the test task? By the way, can it be successfully scheduled using the default scheduler of k8s?
I used the sample yaml in vaolcano-gpu-number readme.
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod1
namespace: model
spec:
containers:
- name: cuda-container
image: nvidia/cuda:9.0-devel
command: ["sleep"]
args: ["100000"]
resources:
limits:
volcano.sh/gpu-number: 1 # requesting 1 gpu cards
# nvidia.com/gpu: 1
I also installed nvidia's k8s-device-plugin for testing. For example, when the limits field used nvidia.com/gpu, the pod's container works well, and it has one gpu devices. When i used volcano.sh/gpu-number, the container's env doesn't have the variable VOLCANO_GPU_ALLOCATED
, the NVIDIA_VISIBLE_DEVICES
is all
.
I tried the gpu-sharing with volcano, according to the official tutorial to test, I can find the corresponding environment variables in the pod.
Volcano Device Plugin GPUSTRATEGY
default is theShare
mode, that is, you can use the Volcano.sh/GPU-MEMOMORY
.
If you use thevolcano.sh/gpu-number
, you need number`, see for details: config-the-volcano-device-plugin-binary
Hope the above information is helpful to you.
你好,我在尝试volcano gpu number的服务调度,在根据volcano的教程步骤安装之后,每一个带gpu的node都能够正确的显示有多少块gpu,但是在创建pod的时候,container的容器中没有volcano-gpu-number这一个环境变量,在里面输入nvidia-smi能够看到该节点所有的gpu,想问一下是否需要更改yaml文件?