Closed hhh12hhh closed 1 month ago
Encounter the same problem! when the finetuning method turn to ptuning or others, there weren't be this problem. Is there any thing wrong with the peft.PrefixTuningConfig?
@JunoLiusj if you are using it with FSDP unfortunately its not supported, https://github.com/meta-llama/llama-recipes/pull/482
System Info
PyTorch version: 2.0.1+cu117 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.27.0 Libc version: glibc-2.31
Python version: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-71-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 3090
Nvidia driver version: 495.29.05 Versions of relevant libraries: mypy-extensions==1.0.0 numpy==1.23.5 torch==2.0.1 torchdata==0.6.1 torchtext==0.15.2 torchvision==0.15.2 numpy = 1.23.5 torch = 2.0.1 torchdata = 0.6.1 torchtext =0.15.2 torchvision = 0.15.2
Information
🐛 Describe the bug
I encountered the above error while fine-tuning the model with prefix here is my fine-tuning script:
Error logs
Traceback (most recent call last): File "/home/zxy/llama2/llama2-lora-fine-tuning/llama-recipes-main/examples/finetuning.py", line 8, in
fire.Fire(main)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, *kwargs)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/finetuning.py", line 237, in main
results = train(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/utils/train_utils.py", line 84, in train
scaler.scale(loss).backward()
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, args)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Expected behavior
I want to know if I wrote something wrong or other reasons, how to solve it