$ sh run_sft.sh
并没有改动seed的值,但是出现问题
run_clm_sft_with_peft.py: error: argument --seed: expected one argument
E0514 15:53:30.806000 140485778084800 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 2) local_rank: 0 (pid: 24115) of binary: /home/user_1/fengkaixuan/.conda/envs/llama3/bin/python
Traceback (most recent call last):
File "/home/user_1/fengkaixuan/.conda/envs/llama3/bin/torchrun", line 8, in
sys.exit(main())
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
提交前必须检查以下项目
问题类型
模型训练与精调
基础模型
Llama-3-8B-Instruct
操作系统
Linux
详细描述问题
$ sh run_sft.sh 并没有改动seed的值,但是出现问题 run_clm_sft_with_peft.py: error: argument --seed: expected one argument E0514 15:53:30.806000 140485778084800 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 2) local_rank: 0 (pid: 24115) of binary: /home/user_1/fengkaixuan/.conda/envs/llama3/bin/python Traceback (most recent call last): File "/home/user_1/fengkaixuan/.conda/envs/llama3/bin/torchrun", line 8, in
sys.exit(main())
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user_1/fengkaixuan/.conda/envs/llama3/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
run_clm_sft_with_peft.py FAILED
Failures: