huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
15.79k stars 1.52k forks source link

IA3 with decoder-only LLMs containing "query_key_value" parameters #1483

Closed ospanbatyr closed 5 months ago

ospanbatyr commented 6 months ago

Hello everyone,

My question is regarding using IA3 with decoder-only LLMs containing "query_key_value" parameters, such as GPTNeoX and Falcon families (which I'll refer to as "query_key_value LLMs").

Background:

My goal:

I'm interested in using the default IA3 configuration with Pythia models (also GPT-NeoX type) for my research. However, the aforementioned incompatibility presents a challenge, potentially requiring base model code modifications.

Seeking suggestions:

I'm eager to hear your thoughts and suggestions on how to proceed. Here are some potential approaches:

Thank you for your time and consideration. I appreciate your contributions to this valuable library!

BenjaminBossan commented 6 months ago

Yes, I think you're right, if qkv are computed through a single parameter, there is currently no way to apply IA³ selectively to only the kv.

As for potential solutions, I'm not aware of other IA³ implementations that would fix that, but I'm also not actively following alternatives. Using a different model architecture doesn't sound very practical. I'd strongly consider if you could not accept that q is also modified, it depends on your use case.

If I wanted to change this on PEFT directly, I'd start on this line:

https://github.com/huggingface/peft/blob/65513e5db4d23935f9fc793eafd70bd0b945da90/src/peft/tuners/ia3/layer.py#L168 https://github.com/huggingface/peft/blob/65513e5db4d23935f9fc793eafd70bd0b945da90/src/peft/tuners/ia3/layer.py#L113

Here, the IA³ vector is retrieved for the forward call and for merging, respectively. You could set the elements that correspond to q to 1, so that the q values are unaffected.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.