Open fahadh4ilyas opened 1 year ago
Hi, I think I found something inconsistent about qa-lora. From your algorithm in paper, you shrink the input size from Din to L by using AvgPool1D but also multiply it by Din//L which means it's not average but summation. But, your script didn't do that in
peft_utils.py
. Is it intentional? Your script is likeadapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x)))) * scale).type_as(result)
when it should be
adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x) * group_size))) * scale).type_as(result)
Hi,both of the implementations are equivalent. When using pooling operation, the merged adapters will be divided by the group-size(D_in//L)(Please refer to the merge.py file), as the pooling operation has an average operation inside. If sum operation, there is no need to divide the D_in//L during the merge. But thanks for pointing out, I will add some notice in the readme.
Hi, I think I found something inconsistent about qa-lora. From your algorithm in paper, you shrink the input size from Din to L by using AvgPool1D but also multiply it by Din//L which means it's not average but summation. But, your script didn't do that in
peft_utils.py
. Is it intentional? Your script is likewhen it should be