[Draft] Split `qkv_proj` into `q_proj`, `k_proj` and `v_proj`

microsoft / mttl

Building modular LMs with parameter-efficient fine-tuning.

MIT License

78 stars 7 forks source link

Closed pclucas14 closed 2 months ago

pclucas14 commented 2 months ago

For models that stack the query, key, value projections in the attention layer, this PR splits the qkv_proj into q_proj, k_proj, and v_proj to allow for individual patching of the layers with LoRA.
This remove the need for phi_2_align_heads and replaces it with split_qkv in the LoRAConfig class.

pclucas14 commented 2 months ago

closing for now