AEProgrammer commented 7 months ago

System Info

transformers version: 4.38.2
Platform: Linux-4.18.20-2.el7.wuba.lp.x86_64-x86_64-with-glibc2.31
Python version: 3.10.13
Huggingface_hub version: 0.21.3
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.2.1+cu121 (True)
peft 0.9.0

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder
[X] My own task or dataset (give details below)

Reproduction

        for adapter in adapter_to_merge:
            model: "LoraModel" = PeftModel.from_pretrained(model, adapter)
            model = model.merge_and_unload()

Expected behavior

when i load a adalora weights,i got the error：

Traceback (most recent call last):
  File "/code/liuhui67/LLaMA-Factory/scripts/../src/train_bash.py", line 14, in <module>
    main()
  File "/code/liuhui67/LLaMA-Factory/scripts/../src/train_bash.py", line 5, in main
    run_exp()
  File "/code/liuhui67/LLaMA-Factory/src/llmtuner/train/tuner.py", line 31, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/code/liuhui67/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 33, in run_sft
    model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
  File "/code/liuhui67/LLaMA-Factory/src/llmtuner/model/loader.py", line 94, in load_model
    model = init_adapter(model, model_args, finetuning_args, is_trainable)
  File "/code/liuhui67/LLaMA-Factory/src/llmtuner/model/adapter.py", line 109, in init_adapter
    model: "LoraModel" = PeftModel.from_pretrained(model, adapter)
  File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/peft_model.py", line 353, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/peft_model.py", line 697, in load_adapter
    load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
  File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 243, in set_peft_model_state_dict
    model.resize_modules_by_rank_pattern(rank_pattern, adapter_name)
  File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/tuners/adalora/model.py", line 277, in resize_modules_by_rank_pattern
    target.update_layer(
  File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/tuners/adalora/layer.py", line 41, in update_layer
    raise ValueError(f"`r` should be a positive integer value but the value passed is {r}")
ValueError: `r` should be a positive integer value but the value passed is 0

this is my rank_pattern some layers are all false in rank_pattern might lead to this error? hope some one can help me!~

BenjaminBossan commented 7 months ago

Indeed the issue is that AdaLoRA has determined that layer to have such low importance that its rank is reduced to 0. Probably that means some hyper-parameters should be changed to avoid getting into this situation.

Now that you have the trained model, I wonder, however, if we could theoretically allow rank 0 or if something else would break later on. Could you please try something: Could please comment out the check in lines 41-42 of /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/tuners/adalora/layer.py and see if the model works correctly?

AEProgrammer commented 7 months ago

Indeed the issue is that AdaLoRA has determined that layer to have such low importance that its rank is reduced to 0. Probably that means some hyper-parameters should be changed to avoid getting into this situation.

Now that you have the trained model, I wonder, however, if we could theoretically allow rank 0 or if something else would break later on. Could you please try something: Could please comment out the check in lines 41-42 of /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/tuners/adalora/layer.py and see if the model works correctly?

I am very glad to get your reply. As you said, the modification will be effective. I got model works correctly and get the right inference results. But in my own dataset, adalora's test metrics were much lower than lora's，I wonder if there's something wrong with my training parameters or something else. I use huggingface trainer and add "model.base_model.update_and_allocate(self.state.global_step)" after line 2022 to train the model with adalora. I don't know if it's right.

BenjaminBossan commented 7 months ago

I am very glad to get your reply. As you said, the modification will be effective. I got model works correctly and get the right inference results.

Thanks for testing this. I created a PR to adjust the check so that rank 0 is allowed.

But in my own dataset, adalora's test metrics were much lower than lora's，I wonder if there's something wrong with my training parameters or something else.

Hard to say, as there are many hyper-parameters associated with AdaLoRA that could be tweaked. I don't have any practical experience with it.

I use huggingface trainer and add "model.base_model.update_and_allocate(self.state.global_step)" after line 2022 to train the model with adalora. I don't know if it's right.

Not sure what line you're referring to, but the idea would be to call this at the end of each training step, like in this example.

BenjaminBossan commented 7 months ago

1540 is merged, so the issue should be resolved. You can wait for the next PEFT release or install it from source to benefit from the fix.

huggingface / peft

load adalora weights error in resize_modules_by_rank_pattern;r=0 #1539

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

1540 is merged, so the issue should be resolved. You can wait for the next PEFT release or install it from source to benefit from the fix.