This post summarizes the performance of the big pre-trained models using a parameter efficient approach as mentioned here https://github.com/huggingface/peft. The results here have been documented for 4 types of peft techniques. For all the approaches after applying the parameter efficient reduction on the original model, the percentage of trainable parameters left was around 5-6%.

Task Description Gnad10: https://huggingface.co/datasets/gnad10 (Sequence Classification) Epochs: 10 Learning Rate: 3e-4

LORA params: bias=none r=8 lora alpha=0.16 lora dropout=0.0

P Tune params:

virtual tokens=20

encoder hidden states=128

Prefix Tune params

virtual tokens=20

Prompt Tune params:

virtual tokens=10

Approx percentage of trained params when using peft techniques: gpt2-xl: 7.2 bloom-1b5: 7.5 bloom-6b4: 3.2

Full setting uses all the parameters during the training process Freezed layer mode has percentage of trainable params between 3-7 percent. Hence the results are comparable.

Task: GNAD10

Model	LORA	P tuning	Prefix tuning	Prompt tuning	Full	Freezed Layers
bert-base-german-cased	0.89	0.79	0.83	0.80	0.80	0.77
gpt2-xl-german	0.89	0.79	0.86	0.77	0.84	0.77
bloom-1b5-clp	0.89	0.78	0.85	0.76	0.80	0.81
bloom-6b4-clp	0.85	0.79	0.80	0.79	-	0.81

GERMEVAL18	Model	LORA	P tuning	Prefix tuning	Prompt tuning	Full
bert-base-german-cased	0.89	0.87	0.87	0.81	0.87	0.86
gpt2-xl-german	0.89	0.87	0.87	0.87	0.88	0.87
bloom-1b5-clp	0.89	0.89	0.87	0.89	0.85	0.83
bloom-6b4-clp	0.86	0.85	0.85	0.85	-	0.84

malteos / finetune-evaluation-harness

Performance of Parameter Efficient Fine Tuning Method #8

virtual tokens=20

virtual tokens=20

virtual tokens=10