Closed DreamShibei closed 7 months ago
Hi! Thanks for your excellent work! It is described in the paper that you subtract a relative position embedding on the attention matrix. However, in the code, self.alibi is added to the attention matrix, what is the reason to do so?
self.alibi
I found results here! https://nn.labml.ai/transformers/alibi/index.html
Hi! Thanks for your excellent work! It is described in the paper that you subtract a relative position embedding on the attention matrix. However, in the code,
self.alibi
is added to the attention matrix, what is the reason to do so?