Closed MiuMiuMiue closed 1 year ago
@MiuMiuMiue Hey thank you so much for the observation, I've fixed it now. we're still working heavily on this and we're trying to perfect it.
Is there any other error that catches your eye?
Yea I think it's good now. The index dim in gather() function has also been fixed
Hi all,
I have just finished the paper and trying to understand the code. And I think there might be an issue in LongNet/attention.py. You initialized self.head_offsets to be a matrix, but you paseds this matrix to utils.sparsifyIndices() as the head_idx, which should be an integer.
And I suspect the head_idx is used for generating different sparcified pattern, but the head_idx remained the same in the function, which might caused all heads share the same pattern.
Just a little bit confused here.