Add support for IA3 PEFT Strategy

arnavgarg1 commented 11 months ago

Paper: https://arxiv.org/pdf/2205.05638.pdf

Adds support for a new PEFT strategy called IA3, which adds 2 learned vectors to the V and Q projections in attention heads, as well as a learned vector to the feed-forward network. The idea is that these learned vectors can help rescale the attention values and feed-forward network values. l_k, l_v, and l_ff are all initialized with ones so that the overall function computed by the model does not change when they are added.

IA3 makes mixed-task batches possible because each sequence of activations in the batch can be separately and cheaply multiplied by its associated learned task vector (in some ways, very similar to how you can train different rank decomposed matrices with LoRA for each task). In the event that a model will only be used on a single task, the modifications introduced by IA3 can also be applied to weight matrices permanently so that no elementwise multiplication is required and the model’s architecture remains unchanged. This is possible because element-wise multiplications performed in IA3 always co-occur with matrix multiplication, which means that there is no additional computational cost compared to the original model.

github-actions[bot] commented 11 months ago

Unit Test Results

  6 files ±0   6 suites ±0 14m 21s :stopwatch: +4s 12 tests ±0   9 :heavy_check_mark: ±0   3 :zzz: ±0 0 :x: ±0 60 runs ±0 42 :heavy_check_mark: ±0 18 :zzz: ±0 0 :x: ±0

Results for commit 9bacdc47. ± Comparison against base commit 6fb795d5.

arnavgarg1 commented 11 months ago

@alexsherstinsky Yes, that is exactly right :)

ludwig-ai / ludwig

Add support for IA3 PEFT Strategy #3818

Unit Test Results