This post summarizes the performance of the big pre-trained models using a parameter efficient approach as mentioned here https://github.com/huggingface/peft. The results here have been documented for 4 types of peft techniques. For all the approaches after applying the parameter efficient reduction on the original model, the percentage of trainable parameters left was around 5-6%.
Approx percentage of trained params when using peft techniques:
gpt2-xl: 7.2
bloom-1b5: 7.5
bloom-6b4: 3.2
Full setting uses all the parameters during the training process
Freezed layer mode has percentage of trainable params between 3-7 percent. Hence the results are comparable.
This post summarizes the performance of the big pre-trained models using a parameter efficient approach as mentioned here https://github.com/huggingface/peft. The results here have been documented for 4 types of peft techniques. For all the approaches after applying the parameter efficient reduction on the original model, the percentage of trainable parameters left was around 5-6%.
Task Description Gnad10: https://huggingface.co/datasets/gnad10 (Sequence Classification) Epochs: 10 Learning Rate: 3e-4
LORA params: bias=none r=8 lora alpha=0.16 lora dropout=0.0
P Tune params:
virtual tokens=20
encoder hidden states=128
Prefix Tune params
virtual tokens=20
Prompt Tune params:
virtual tokens=10
Approx percentage of trained params when using peft techniques: gpt2-xl: 7.2 bloom-1b5: 7.5 bloom-6b4: 3.2
Full setting uses all the parameters during the training process Freezed layer mode has percentage of trainable params between 3-7 percent. Hence the results are comparable.
Task: GNAD10