The paper is using Sum Pooling but the script is using Average Pooling

yuhuixu1993 / qa-lora

Official PyTorch implementation of QA-LoRA

MIT License

113 stars 10 forks source link

Hi, I think I found something inconsistent about qa-lora. From your algorithm in paper, you shrink the input size from Din to L by using AvgPool1D but also multiply it by Din//L which means it's not average but summation. But, your script didn't do that in peft_utils.py. Is it intentional? Your script is like
adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x)))） * scale).type_as(result)
when it should be
adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x) * group_size))） * scale).type_as(result)

Hi，both of the implementations are equivalent. When using pooling operation, the merged adapters will be divided by the group-size(D_in//L)(Please refer to the merge.py file), as the pooling operation has an average operation inside. If sum operation, there is no need to divide the D_in//L during the merge. But thanks for pointing out, I will add some notice in the readme.

yuhuixu1993 / qa-lora

The paper is using Sum Pooling but the script is using Average Pooling #5