Closed lizhiboo closed 4 days ago
Motivation: Arena uses nvidia gpu by default, haven't yet supported other chip vendors such as AMD, Ascend, Hygon etc.
Design: add --device parameter to set gpu request in Pod's resources, as below:
resources: limits: cpu: "10" memory: 32Gi hygon.com/dcu: 1 requests: cpu: "10" memory: 32Gi hygon.com/dcu: 1
Usage:
arena submit tfjob \ --name=tfjobtest\ --working-dir=/root \ --ps-gpus=1 \ --ps=1 \ --workers=1 \ --device=hygon.com/dcu=1 \ --data-dir=/usr/local/hg-lib:/usr/local/hg-lib \ --image=xxx:ascend_tensorflow_test \ 'sh -c train.sh' arena serve custom \ --name=cstest\ --replicas=1 \ --port=80 \ --device=huawei.com/Ascend910=1 \ --data-dir=/usr/local/ascend910-driver:/usr/local/ascend910-driver \ --image=xxx:ascend-test \ --command="sh train.sh"
Motivation: Arena uses nvidia gpu by default, haven't yet supported other chip vendors such as AMD, Ascend, Hygon etc.
Design: add --device parameter to set gpu request in Pod's resources, as below:
Usage: