1032
【2023-07-09 22:14:41】RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
1033
【2023-07-09 22:14:41】 main()
1034
【2023-07-09 22:14:41】 File "train_sft.py", line 335, in main
1035
【2023-07-09 22:14:41】 model.backward(loss)
1036
【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
1037
【2023-07-09 22:14:41】 ret_val = func(*args, **kwargs)
1038
【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1873, in backward
1039
【2023-07-09 22:14:41】 self.optimizer.backward(loss, retain_graph=retain_graph)
1040
【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/runtime/fp16/fused_optimizer.py", line 353, in backward
1041
【2023-07-09 22:14:41】 scaled_loss.backward(create_graph=create_graph, retain_graph=retain_graph)
1042
【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
1043
【2023-07-09 22:14:41】 torch.autograd.backward(
1044
【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward
1045
【2023-07-09 22:14:41】 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
1046
【2023-07-09 22:14:41】RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
1032 【2023-07-09 22:14:41】RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn 1033 【2023-07-09 22:14:41】 main() 1034 【2023-07-09 22:14:41】 File "train_sft.py", line 335, in main 1035 【2023-07-09 22:14:41】 model.backward(loss) 1036 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn 1037 【2023-07-09 22:14:41】 ret_val = func(*args, **kwargs) 1038 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1873, in backward 1039 【2023-07-09 22:14:41】 self.optimizer.backward(loss, retain_graph=retain_graph) 1040 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/runtime/fp16/fused_optimizer.py", line 353, in backward 1041 【2023-07-09 22:14:41】 scaled_loss.backward(create_graph=create_graph, retain_graph=retain_graph) 1042 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward 1043 【2023-07-09 22:14:41】 torch.autograd.backward( 1044 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward 1045 【2023-07-09 22:14:41】 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 1046 【2023-07-09 22:14:41】RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Base model is chatglm2, train script is as below:
deepspeed train_sft.py \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --max_source_length 1024 \ --max_target_length 512 \ --learning_rate 2e-5 \ --weight_decay 0. \ --num_train_epochs 3 \ --gradient_accumulation_steps 1 \ --lr_scheduler_type cosine \ --num_warmup_steps 0 \ --seed 1234 \ --zero_stage 0 \ --lora_dim 16 \ --lora_module_name query_key_value \ --only_optimize_lora \ --deepspeed \ --output_dir