hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.24k stars 3.95k forks source link

FSDP QDoRa #3550

Open etemiz opened 5 months ago

etemiz commented 5 months ago

Reminder

Reproduction

Is LLaMa-Factory capable of FSDP QDoRa described here: https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html It seems promising and beating even full fine tuning! I would love to continue using LLaMa-Factory and not change my scripts..

Expected behavior

Could LLaMa-Factory support FSDP QDora?

System Info

No response

Others

No response

hiyouga commented 5 months ago

Simply add --use_dora True to this script https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/extras/fsdp_qlora/sft.sh

shangzyu commented 5 months ago

Simply add --use_dora True to this script https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/extras/fsdp_qlora/sft.sh

hi, can QDora work with fsdp_offload_params. When try FSDP + QDora + Offload with Qwen 1.5 72B: ValueError: Expected a cuda device, but got: cpu Thanks.

etemiz commented 5 months ago

I got the same error when I tried --use_dora True.

shangzyu commented 5 months ago

I got the same error when I tried --use_dora True.

https://github.com/huggingface/peft/pull/1724

etemiz commented 4 months ago

I added use_dora: true to the yaml. It said "Cannot flatten integer dtype tensors":

Screenshot from 2024-06-11 20-15-50

version 0.8.0

Thanks!

hiyouga commented 4 months ago

@etemiz what is your bitsandbytes version?

etemiz commented 4 months ago

bitsandbytes 0.43.1

hiyouga commented 4 months ago

@etemiz try

pip uninstall peft
pip install git+https://github.com/huggingface/peft.git

https://github.com/huggingface/peft/pull/1806

fdalvi commented 3 months ago

Hello, was just trying this out as well; Using the latest peft as suggested gets rid of the "cannot flatten integer dtype tensors" error. However, a new error now shows up when the training starts:

[rank0]:   File "LLaMA-Factory/src/train.py", line 28, in <module>
[rank0]:     main()
[rank0]:   File "LLaMA-Factory/src/train.py", line 19, in main
[rank0]:     run_exp()
[rank0]:   File "LLaMA-Factory/src/llamafactory/train/tuner.py", line 45, in run_exp
[rank0]:     run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
[rank0]:   File "LLaMA-Factory/src/llamafactory/train/pt/workflow.py", line 62, in run_pt
[rank0]:     train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3238, in training_step
[rank0]:     loss = self.compute_loss(model, inputs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3264, in compute_loss
[rank0]:     outputs = model(**inputs)
[rank0]:               ^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
[rank0]:     return model_forward(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
[rank0]:     return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
[rank0]:     return model_forward(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
[rank0]:     return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/peft/peft_model.py", line 1501, in forward
[rank0]:     return self.base_model(
[rank0]:            ^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
[rank0]:     return self.model.forward(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
[rank0]:     outputs = self.model(
[rank0]:               ^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 957, in forward
[rank0]:     layer_outputs = self._gradient_checkpointing_func(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "LLaMA-Factory/src/llamafactory/model/model_utils/checkpointing.py", line 65, in custom_gradient_checkpointing_func
[rank0]:     return gradient_checkpointing_func(func, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/_compile.py", line 24, in inner
[rank0]:     return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
[rank0]:     return CheckpointFunction.apply(function, preserve, *args)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/autograd/function.py", line 598, in apply
[rank0]:     return super().apply(*args, **kwargs)  # type: ignore[misc]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 262, in forward
[rank0]:     outputs = run_function(*args)
[rank0]:               ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 713, in forward
[rank0]:     hidden_states, self_attn_weights, present_key_value = self.self_attn(
[rank0]:                                                           ^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 416, in forward
[rank0]:     query_states = self.q_proj(hidden_states)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/lora/bnb.py", line 492, in forward
[rank0]:     output = self.lora_magnitude_vector[active_adapter](
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/lora/dora.py", line 72, in forward
[rank0]:     x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=x.dtype)
[rank0]:                       ~~~~~~~~~~~~~~~~~~~^^^
[rank0]: IndexError: tuple index out of range

Any suggestions? looks like the dora code assumes lora parameters will be sent with a certain shape, but they are not being passed as expected.

etemiz commented 3 months ago

Tried these

pip uninstall peft
pip install git+https://github.com/huggingface/peft.git

and getting the same error:

rank1]:   File "...../LLaMA-Factory/v/lib/python3.11/site-packages/peft/tuners/lora/dora.py", line 74, in forward
[rank1]:     x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=x.dtype)
[rank1]:                       ~~~~~~~~~~~~~~~~~~~^^^
[rank1]: IndexError: tuple index out of range

peft 0.11.2.dev0 (latest on github) bitsandbytes 0.43.1 LLaMA-Factory latest on github model Llama3-70B

I was using fsdp_qlora for a while. It works well. Thanks for this amazing software. Now I tried to do qdora. It didn't work.

lmc8133 commented 3 weeks ago

Hello, was just trying this out as well; Using the latest peft as suggested gets rid of the "cannot flatten integer dtype tensors" error. However, a new error now shows up when the training starts:

[rank0]:   File "LLaMA-Factory/src/train.py", line 28, in <module>
[rank0]:     main()
[rank0]:   File "LLaMA-Factory/src/train.py", line 19, in main
[rank0]:     run_exp()
[rank0]:   File "LLaMA-Factory/src/llamafactory/train/tuner.py", line 45, in run_exp
[rank0]:     run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
[rank0]:   File "LLaMA-Factory/src/llamafactory/train/pt/workflow.py", line 62, in run_pt
[rank0]:     train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3238, in training_step
[rank0]:     loss = self.compute_loss(model, inputs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3264, in compute_loss
[rank0]:     outputs = model(**inputs)
[rank0]:               ^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
[rank0]:     return model_forward(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
[rank0]:     return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
[rank0]:     return model_forward(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
[rank0]:     return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/peft/peft_model.py", line 1501, in forward
[rank0]:     return self.base_model(
[rank0]:            ^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
[rank0]:     return self.model.forward(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
[rank0]:     outputs = self.model(
[rank0]:               ^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 957, in forward
[rank0]:     layer_outputs = self._gradient_checkpointing_func(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "LLaMA-Factory/src/llamafactory/model/model_utils/checkpointing.py", line 65, in custom_gradient_checkpointing_func
[rank0]:     return gradient_checkpointing_func(func, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/_compile.py", line 24, in inner
[rank0]:     return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
[rank0]:     return CheckpointFunction.apply(function, preserve, *args)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/autograd/function.py", line 598, in apply
[rank0]:     return super().apply(*args, **kwargs)  # type: ignore[misc]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 262, in forward
[rank0]:     outputs = run_function(*args)
[rank0]:               ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 713, in forward
[rank0]:     hidden_states, self_attn_weights, present_key_value = self.self_attn(
[rank0]:                                                           ^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 416, in forward
[rank0]:     query_states = self.q_proj(hidden_states)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/lora/bnb.py", line 492, in forward
[rank0]:     output = self.lora_magnitude_vector[active_adapter](
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/lora/dora.py", line 72, in forward
[rank0]:     x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=x.dtype)
[rank0]:                       ~~~~~~~~~~~~~~~~~~~^^^
[rank0]: IndexError: tuple index out of range

Any suggestions? looks like the dora code assumes lora parameters will be sent with a certain shape, but they are not being passed as expected.

Same problem. Have you solved it?

lmc8133 commented 3 weeks ago

Tried these

pip uninstall peft
pip install git+https://github.com/huggingface/peft.git

and getting the same error:

rank1]:   File "...../LLaMA-Factory/v/lib/python3.11/site-packages/peft/tuners/lora/dora.py", line 74, in forward
[rank1]:     x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=x.dtype)
[rank1]:                       ~~~~~~~~~~~~~~~~~~~^^^
[rank1]: IndexError: tuple index out of range

peft 0.11.2.dev0 (latest on github) bitsandbytes 0.43.1 LLaMA-Factory latest on github model Llama3-70B

I was using fsdp_qlora for a while. It works well. Thanks for this amazing software. Now I tried to do qdora. It didn't work.

Hi there, have you ever solved it? I have the same problem.

jeffchy commented 2 days ago

same with 0.12.0 peft