microsoft / mttl

Building modular LMs with parameter-efficient fine-tuning.
MIT License
78 stars 7 forks source link

[Draft] Split `qkv_proj` into `q_proj`, `k_proj` and `v_proj` #90

Closed pclucas14 closed 2 months ago

pclucas14 commented 2 months ago
  1. For models that stack the query, key, value projections in the attention layer, this PR splits the qkv_proj into q_proj, k_proj, and v_proj to allow for individual patching of the layers with LoRA.

  2. This remove the need for phi_2_align_heads and replaces it with split_qkv in the LoRAConfig class.

pclucas14 commented 2 months ago

closing for now