Open Bill-Orz opened 1 year ago
我跟原始deepspeed-chat代码对比了一下,看起来是一致的
我跑的时候确实没有遇到这个问题,这里有关于这个问题的一些讨论,可以参考:https://discuss.pytorch.org/t/runtimeerror-element-0-of-variables-does-not-require-grad-and-does-not-have-a-grad-fn/11074/46
谢谢,请问你这边step3是用了4个7B的llama来完成的吗?我这边80G A100,不开gradient checkpointing,一直会OOM
我用了4张卡
同样的问题,请问这个问题是否解决
我的运行脚本如下: CUDA_VISIBLE_DEVICES=0,1,2,3 deepspeed /data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py --data_path /data/bill.bi/RLHFDataset --data_output_path /data/bill.bi/tmp/ --actor_model_name_or_path decapoda-research/llama-7b-hf --tokenizer_name_or_path /data/bill.bi/tmp/rlhf/critic --critic_model_name_or_path /data/bill.bi/tmp/rlhf/critic --num_padding_at_beginning 0 --per_device_train_batch_size 4 --actor_learning_rate 9.85e-6 --critic_learning_rate 5e-6 --ppo_epochs 1 --gradient_accumulation_steps 1 --num_warmup_steps 0 --actor_zero_stage 2 --critic_zero_stage 2 --deepspeed --critic_gradient_checkpointing --actor_gradient_checkpointing --output_dir /data/bill.bi/tmp/rlhf/final --actor_lora_dim 8 --actor_lora_module_name q_proj,k_proj,gate_proj,up_proj --critic_lora_dim 8 --critic_lora_module_name q_proj,k_proj,gate_proj,up_proj --only_optimize_lora --max_prompt_seq_len 1024 1>train_step3.log 2>&1
在执行step3的时候,遇到这个报错,具体的栈信息如下:
Traceback (most recent call last): File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py", line 563, in
main()
File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py", line 476, in main
actor_loss, critic_loss = trainer.train_rlhf(exp_data)
File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 187, in train_rlhf
self.actor_model.backward(actor_loss)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1862, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1901, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn