malteos / finetune-evaluation-harness

MIT License
2 stars 0 forks source link

Performance of Parameter Efficient Fine Tuning Method #8

Open akash418 opened 1 year ago

akash418 commented 1 year ago

This post summarizes the performance of the big pre-trained models using a parameter efficient approach as mentioned here https://github.com/huggingface/peft. The results here have been documented for 4 types of peft techniques. For all the approaches after applying the parameter efficient reduction on the original model, the percentage of trainable parameters left was around 5-6%.

Task Description Gnad10: https://huggingface.co/datasets/gnad10 (Sequence Classification) Epochs: 10 Learning Rate: 3e-4

LORA params: bias=none r=8 lora alpha=0.16 lora dropout=0.0

P Tune params:

virtual tokens=20

encoder hidden states=128

Prefix Tune params

virtual tokens=20

Prompt Tune params:

virtual tokens=10

Approx percentage of trained params when using peft techniques: gpt2-xl: 7.2 bloom-1b5: 7.5 bloom-6b4: 3.2

Full setting uses all the parameters during the training process Freezed layer mode has percentage of trainable params between 3-7 percent. Hence the results are comparable.

Task: GNAD10

Model LORA P tuning Prefix tuning Prompt tuning Full Freezed Layers
bert-base-german-cased 0.89 0.79 0.83 0.80 0.80 0.77
gpt2-xl-german 0.89 0.79 0.86 0.77 0.84 0.77
bloom-1b5-clp 0.89 0.78 0.85 0.76 0.80 0.81
bloom-6b4-clp 0.85 0.79 0.80 0.79 - 0.81
GERMEVAL18 Model LORA P tuning Prefix tuning Prompt tuning Full Freezed Layers
bert-base-german-cased 0.89 0.87 0.87 0.81 0.87 0.86
gpt2-xl-german 0.89 0.87 0.87 0.87 0.88 0.87
bloom-1b5-clp 0.89 0.89 0.87 0.89 0.85 0.83
bloom-6b4-clp 0.86 0.85 0.85 0.85 - 0.84