ofirpress / attention_with_linear_biases

Code for the ALiBi method for transformer language models (ICLR 2022)
MIT License
505 stars 39 forks source link

implementation detail about alibi_mask #18

Open bugm opened 11 months ago

bugm commented 11 months ago

Hello, I am reading the code for generating alibi_mask with link https://github.com/ofirpress/attention_with_linear_biases/blob/master/fairseq/models/transformer.py

for the code in line 760 and line 761

self.alibi = self.slopes.unsqueeze(1).unsqueeze(1) * torch.arange(maxpos).unsqueeze(0).unsqueeze(0).expand(attn_heads, -1, -1) #line760 self.alibi = self.alibi.view(attn_heads, 1, maxpos) #line761 I believe we have gotten a tensor with shape (attn_heads, 1, maxpos) in line 760 already, since self.slopes.unsqueeze(1).unsqueeze(1) is a (attn_heads,1,1) tensor and torch.arange(maxpos).unsqueeze(0).unsqueeze(0).expand(attn_heads, -1, -1) is a (attn_heads,1,maxpos) tensor. So what is the purpose of view it to (attn_heads, 1, maxpos) again?