microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
6.07k stars 1.03k forks source link

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #625

Open boundles opened 1 year ago

boundles commented 1 year ago

1032 【2023-07-09 22:14:41】RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn 1033 【2023-07-09 22:14:41】 main() 1034 【2023-07-09 22:14:41】 File "train_sft.py", line 335, in main 1035 【2023-07-09 22:14:41】 model.backward(loss) 1036 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn 1037 【2023-07-09 22:14:41】 ret_val = func(*args, **kwargs) 1038 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1873, in backward 1039 【2023-07-09 22:14:41】 self.optimizer.backward(loss, retain_graph=retain_graph) 1040 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/deepspeed/runtime/fp16/fused_optimizer.py", line 353, in backward 1041 【2023-07-09 22:14:41】 scaled_loss.backward(create_graph=create_graph, retain_graph=retain_graph) 1042 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward 1043 【2023-07-09 22:14:41】 torch.autograd.backward( 1044 【2023-07-09 22:14:41】 File "/home/luban/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward 1045 【2023-07-09 22:14:41】 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 1046 【2023-07-09 22:14:41】RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Base model is chatglm2, train script is as below: deepspeed train_sft.py \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --max_source_length 1024 \ --max_target_length 512 \ --learning_rate 2e-5 \ --weight_decay 0. \ --num_train_epochs 3 \ --gradient_accumulation_steps 1 \ --lr_scheduler_type cosine \ --num_warmup_steps 0 \ --seed 1234 \ --zero_stage 0 \ --lora_dim 16 \ --lora_module_name query_key_value \ --only_optimize_lora \ --deepspeed \ --output_dir

Rothsword commented 1 year ago

Same problem here, any solutions?

YooSungHyun commented 1 year ago

i got same too

manateeniu commented 1 year ago

Same problem too

SabrinaZhuangxx commented 1 year ago

same problem

telunyang commented 1 year ago

It asks us to set requires_grad=True. How should I put it in the step1 command?