Open Yuanbo2021 opened 3 years ago
I have the same question about the "src_len + 1"/"tgt_len + 1" and self.pos_emb(torch.LongTensor([[1,2,3,4,0]] / self.pos_emb(torch.LongTensor([[5,1,2,3,4]].
In class Encoder: self.pos_emb = nn.Embedding.from_pretrained(get_sinusoid_encoding_table(src_len+1, d_model), freeze=True) enc_outputs = self.src_emb(enc_inputs) + self.pos_emb(torch.LongTensor([[1,2,3,4,0]]))
In class Decoder: self.pos_emb = nn.Embedding.from_pretrained(get_sinusoid_encoding_table(tgt_len+1, d_model),freeze=True) dec_outputs = self.tgt_emb(dec_inputs) + self.pos_emb(torch.LongTensor([[5,1,2,3,4]])) # [batch_size, tgt_len, d_model]
self.pos_emb = nn.Embedding.from_pretrained(get_sinusoid_encoding_table(src_len+1, d_model),freeze=True)
The position encoding table should be (max_len, d_model), why add 1?