kyegomez / LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
https://discord.gg/qUtxnK2NMf
Apache License 2.0
688 stars 64 forks source link

Incorrect argument type passed into utils.sparsifyIndices() #13

Closed MiuMiuMiue closed 1 year ago

MiuMiuMiue commented 1 year ago

Hi all,

I have just finished the paper and trying to understand the code. And I think there might be an issue in LongNet/attention.py. You initialized self.head_offsets to be a matrix, but you paseds this matrix to utils.sparsifyIndices() as the head_idx, which should be an integer.

And I suspect the head_idx is used for generating different sparcified pattern, but the head_idx remained the same in the function, which might caused all heads share the same pattern.

Just a little bit confused here.

kyegomez commented 1 year ago

@MiuMiuMiue Hey thank you so much for the observation, I've fixed it now. we're still working heavily on this and we're trying to perfect it.

Is there any other error that catches your eye?

MiuMiuMiue commented 1 year ago

Yea I think it's good now. The index dim in gather() function has also been fixed