r-three / t-few

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"
MIT License
429 stars 59 forks source link

Clarification about IA^3 #5

Closed sordonia closed 2 years ago

sordonia commented 2 years ago

Hi :)

I was reading your interesting paper https://arxiv.org/pdf/2205.05638.pdf.

In Section 3.3, you specify that IA^3 adds a total of d_k + d_v + d_ff parameters.

However, if I look at this line, you seem to be allocating 2 * d vectors for each linear layer (multi_lora_a, multi_lora_b) and multiplying multi_lora_a with the input and multi_lora_b with the transformed input.

https://github.com/r-three/t-few/blob/9dbc9cc429888a0c27fc22188b4e9549e0e83f40/src/models/lora.py#L43

Am I missing something?

Thank you for your clarification :-)

sordonia commented 2 years ago

Sorry, I just realized that in your config file you restrict the trainable parameters so all good, thank you!

https://github.com/r-three/t-few/blob/9dbc9cc429888a0c27fc22188b4e9549e0e83f40/configs/ia3.json#L7

HaokunLiu commented 2 years ago

Hey, you found the hidden story. IA3 is actually morphed from LoRA.