graykode / nlp-tutorial

Natural Language Processing Tutorial for Deep Learning Researchers
https://www.reddit.com/r/MachineLearning/comments/amfinl/project_nlptutoral_repository_who_is_studying/
MIT License
14.03k stars 3.9k forks source link

Why is src_len+1 in Transformer demo? #66

Open Yuanbo2021 opened 3 years ago

Yuanbo2021 commented 3 years ago

self.pos_emb = nn.Embedding.from_pretrained(get_sinusoid_encoding_table(src_len+1, d_model),freeze=True)

The position encoding table should be (max_len, d_model), why add 1?

HC-2016 commented 3 years ago

I have the same question about the "src_len + 1"/"tgt_len + 1" and self.pos_emb(torch.LongTensor([[1,2,3,4,0]] / self.pos_emb(torch.LongTensor([[5,1,2,3,4]].

In class Encoder: self.pos_emb = nn.Embedding.from_pretrained(get_sinusoid_encoding_table(src_len+1, d_model), freeze=True) enc_outputs = self.src_emb(enc_inputs) + self.pos_emb(torch.LongTensor([[1,2,3,4,0]]))

In class Decoder: self.pos_emb = nn.Embedding.from_pretrained(get_sinusoid_encoding_table(tgt_len+1, d_model),freeze=True) dec_outputs = self.tgt_emb(dec_inputs) + self.pos_emb(torch.LongTensor([[5,1,2,3,4]])) # [batch_size, tgt_len, d_model]