l294265421 / alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
https://88aeeb3aef5040507e.gradio.live/
MIT License
103 stars 13 forks source link

element 0 of tensors does not require grad and does not have a grad_fn #11

Open Bill-Orz opened 1 year ago

Bill-Orz commented 1 year ago

我的运行脚本如下: CUDA_VISIBLE_DEVICES=0,1,2,3 deepspeed /data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py --data_path /data/bill.bi/RLHFDataset --data_output_path /data/bill.bi/tmp/ --actor_model_name_or_path decapoda-research/llama-7b-hf --tokenizer_name_or_path /data/bill.bi/tmp/rlhf/critic --critic_model_name_or_path /data/bill.bi/tmp/rlhf/critic --num_padding_at_beginning 0 --per_device_train_batch_size 4 --actor_learning_rate 9.85e-6 --critic_learning_rate 5e-6 --ppo_epochs 1 --gradient_accumulation_steps 1 --num_warmup_steps 0 --actor_zero_stage 2 --critic_zero_stage 2 --deepspeed --critic_gradient_checkpointing --actor_gradient_checkpointing --output_dir /data/bill.bi/tmp/rlhf/final --actor_lora_dim 8 --actor_lora_module_name q_proj,k_proj,gate_proj,up_proj --critic_lora_dim 8 --critic_lora_module_name q_proj,k_proj,gate_proj,up_proj --only_optimize_lora --max_prompt_seq_len 1024 1>train_step3.log 2>&1

在执行step3的时候,遇到这个报错,具体的栈信息如下:

Traceback (most recent call last): File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py", line 563, in main() File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py", line 476, in main actor_loss, critic_loss = trainer.train_rlhf(exp_data) File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 187, in train_rlhf self.actor_model.backward(actor_loss) File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1862, in backward self.optimizer.backward(loss, retain_graph=retain_graph) File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1901, in backward self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward scaled_loss.backward(retain_graph=retain_graph) File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Bill-Orz commented 1 year ago

我跟原始deepspeed-chat代码对比了一下,看起来是一致的

l294265421 commented 1 year ago

我跑的时候确实没有遇到这个问题,这里有关于这个问题的一些讨论,可以参考:https://discuss.pytorch.org/t/runtimeerror-element-0-of-variables-does-not-require-grad-and-does-not-have-a-grad-fn/11074/46

Bill-Orz commented 1 year ago

谢谢,请问你这边step3是用了4个7B的llama来完成的吗?我这边80G A100,不开gradient checkpointing,一直会OOM

Bill-Orz commented 1 year ago

我用了4张卡

chailt commented 12 months ago

同样的问题,请问这个问题是否解决