Open drxmy opened 1 year ago
Hi! We had a proprietary setup. Are you using Adam and have you made sure to not pass the non-trainable parameters to the optimizer?
I used Adamw with tranformers's trainer class(hugging face). It printed a trainable parameter count. The number was much smaller with Lora.
The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?
In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.
Hello, can I check with you how to use Iora to finetune Bloom-3B? I encountered the issue of Bloom-3B having no v_proj and q_proj in the base model. Thanks a lot!
@aegisgpt
having no v_proj and q_proj in the base model
By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value
for bloom models. Let me know if that solves your problem.
@aegisgpt
having no v_proj and q_proj in the base model
By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to
query_key_value
for bloom models. Let me know if that solves your problem.
Hey @zsc , many thanks! I tried it and it worked! Do you mind sharing where I can find more detailed documentations for LoRA online, especially with regards to configurations for various types of GPTs?
This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py
This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py
Thank you! That helps!
The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?
In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.