Open xHansonx opened 1 year ago
Can I know your environment settings such as your machine type as well as torch, Python versions?
Can I know your environment settings such as your machine type as well as torch, Python versions?
------------ Environment ------------ Colossal-AI version: 0.2.8 PyTorch version: 1.12.1 System CUDA version: 11.3 CUDA version required by PyTorch: 11.3
Sorry for getting to your questions late. May I know why you are setting nproc_per_node=1
when you have multiple nodes on the machine and set the strategy to be ddp
?
Thanks for reporting. #4023 Contains this now.
🐛 Describe the bug
Code:
torchrun --standalone --nproc_per_node=1 train_reward_model.py --dataset Dahoas/rm-static --subset ../../../datasets/Dahoas_rm-static --max_len 512 --model gpt2 --pretrain ../../../gpt2/gpt2-small --lora_rank 0 --max_epochs 1 --batch_size 1 --loss_fn log_sig --test True --need_optim_ckpt True --strategy ddp --save_path rm_ckpt.pt
Error:
Environment
No response