microsoft / LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
https://arxiv.org/abs/2106.09685
MIT License
10.86k stars 691 forks source link

Fintuning 176B Bloom with lora #43

Open drxmy opened 1 year ago

drxmy commented 1 year ago

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

edwardjhu commented 1 year ago

Hi! We had a proprietary setup. Are you using Adam and have you made sure to not pass the non-trainable parameters to the optimizer?

drxmy commented 1 year ago

I used Adamw with tranformers's trainer class(hugging face). It printed a trainable parameter count. The number was much smaller with Lora.

aegisgpt commented 1 year ago

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

Hello, can I check with you how to use Iora to finetune Bloom-3B? I encountered the issue of Bloom-3B having no v_proj and q_proj in the base model. Thanks a lot!

zsc commented 1 year ago

@aegisgpt

having no v_proj and q_proj in the base model

By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value for bloom models. Let me know if that solves your problem.

aegisgpt commented 1 year ago

@aegisgpt

having no v_proj and q_proj in the base model

By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value for bloom models. Let me know if that solves your problem.

Hey @zsc , many thanks! I tried it and it worked! Do you mind sharing where I can find more detailed documentations for LoRA online, especially with regards to configurations for various types of GPTs?

zsc commented 1 year ago

This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py

aegisgpt commented 1 year ago

This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py

Thank you! That helps!