johnsmith0031 / alpaca_lora_4bit

MIT License
533 stars 84 forks source link

Targeting all layers and biases #141

Closed grimulkan closed 1 year ago

grimulkan commented 1 year ago

What would I have to do to target all relevant layers including biases with this repo for LORA training? For instance, would the following change alone work?

lora_config = LoraConfig(
    r=ft_config.lora_r,
    lora_alpha=ft_config.lora_alpha,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=ft_config.lora_dropout,
    bias="all", #or "lora_only"
    task_type="CAUSAL_LM",
)

I am not sure if this actually includes the biases or if they just get zeroed out elsewhere. If there are other places to modify, I would appreciate any suggestions.

Also, I have not tried it before, but is there any value in including the biases for LORA training? Similarly, is there any value in training the other layers (other than the 4 listed above)?

I am aware of the result where targeting the above 4 was more effective than this repo's default (only q, v) allowing for lower LORA rank, but not sure if there is any further benefit from the other layers or biases. There is some relevant discussion in https://github.com/johnsmith0031/alpaca_lora_4bit/issues/129 as well (with comments from @kaiokendev). Would appreciate any thoughts, folklore or otherwise.

kaiokendev commented 1 year ago

LLaMA does not use bias (you can verify in modeling_llama.py, the bias is set to False and bias tensors are all 0), as for the modules you need to also add gate_proj, up_proj, down_proj -- MLP modules. lm_head is not necessary

grimulkan commented 1 year ago

You’re right about the biases! Much appreciated.