yuhuixu1993 / qa-lora

Official PyTorch implementation of QA-LoRA
MIT License
113 stars 10 forks source link

The paper is using Sum Pooling but the script is using Average Pooling #5

Open fahadh4ilyas opened 1 year ago

fahadh4ilyas commented 1 year ago

Hi, I think I found something inconsistent about qa-lora. From your algorithm in paper, you shrink the input size from Din to L by using AvgPool1D but also multiply it by Din//L which means it's not average but summation. But, your script didn't do that in peft_utils.py. Is it intentional? Your script is like

adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x)))) * scale).type_as(result)

when it should be

adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x) * group_size))) * scale).type_as(result)
yuhuixu1993 commented 1 year ago

Hi, I think I found something inconsistent about qa-lora. From your algorithm in paper, you shrink the input size from Din to L by using AvgPool1D but also multiply it by Din//L which means it's not average but summation. But, your script didn't do that in peft_utils.py. Is it intentional? Your script is like


adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x)))) * scale).type_as(result)

when it should be


adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x) * group_size))) * scale).type_as(result)

Hi,both of the implementations are equivalent. When using pooling operation, the merged adapters will be divided by the group-size(D_in//L)(Please refer to the merge.py file), as the pooling operation has an average operation inside. If sum operation, there is no need to divide the D_in//L during the merge. But thanks for pointing out, I will add some notice in the readme.