For models that stack the query, key, value projections in the attention layer, this PR splits the qkv_proj into q_proj, k_proj, and v_proj to allow for individual patching of the layers with LoRA.
This remove the need for phi_2_align_heads and replaces it with split_qkv in the LoRAConfig class.
For models that stack the query, key, value projections in the attention layer, this PR splits the
qkv_proj
intoq_proj
,k_proj
, andv_proj
to allow for individual patching of the layers withLoRA
.This remove the need for
phi_2_align_heads
and replaces it withsplit_qkv
in theLoRAConfig
class.