Register a model version when submitting a training job:
$ arena submit pytorchjob \
--name=bloom-sft-2 \
--gpus=1 \
--image=registry.cn-hangzhou.aliyuncs.com/acs/deepspeed:v0.9.0-chat \
--label=xxx=yyy \
--data=training-data:/model \
--model-name=my-model \
--model-source=pvc://default/training-data/bloom-560m-sft \
"cd /model/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning && bash training_scripts/other_language/run_chinese.sh /model/bloom-560m-sft"
pytorchjob.kubeflow.org/bloom-sft-2 created
INFO[0001] The Job bloom-sft-2 has been submitted successfully
INFO[0001] You can run `arena get bloom-sft-2 --type pytorchjob -n default` to check the job status
INFO[0001] registered model "my-model" created
INFO[0001] model version 1 for "my-model" created
The info shows that model version 1 for model named my-model was created, but when getting the job, the model name is bloom-sft-2 rather than my-model:
Register a model version when submitting a training job:
The info shows that model version
1
for model namedmy-model
was created, but when getting the job, the model name isbloom-sft-2
rather thanmy-model
: