self.alibi = self.slopes.unsqueeze(1).unsqueeze(1) * torch.arange(maxpos).unsqueeze(0).unsqueeze(0).expand(attn_heads, -1, -1) #line760
self.alibi = self.alibi.view(attn_heads, 1, maxpos) #line761
I believe we have gotten a tensor with shape (attn_heads, 1, maxpos) in line 760 already, since self.slopes.unsqueeze(1).unsqueeze(1) is a (attn_heads,1,1) tensor and torch.arange(maxpos).unsqueeze(0).unsqueeze(0).expand(attn_heads, -1, -1) is a (attn_heads,1,maxpos) tensor.
So what is the purpose of view it to (attn_heads, 1, maxpos) again?
Hello, I am reading the code for generating alibi_mask with link https://github.com/ofirpress/attention_with_linear_biases/blob/master/fairseq/models/transformer.py
for the code in line 760 and line 761
self.alibi = self.slopes.unsqueeze(1).unsqueeze(1) * torch.arange(maxpos).unsqueeze(0).unsqueeze(0).expand(attn_heads, -1, -1) #line760 self.alibi = self.alibi.view(attn_heads, 1, maxpos) #line761 I believe we have gotten a tensor with shape (attn_heads, 1, maxpos) in line 760 already, since self.slopes.unsqueeze(1).unsqueeze(1) is a (attn_heads,1,1) tensor and torch.arange(maxpos).unsqueeze(0).unsqueeze(0).expand(attn_heads, -1, -1) is a (attn_heads,1,maxpos) tensor. So what is the purpose of view it to (attn_heads, 1, maxpos) again?