mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
https://arxiv.org/abs/2004.11886
Other
596 stars 81 forks source link

about padding!!! #39

Closed sanwei111 closed 2 years ago

sanwei111 commented 3 years ago

hello,as I see in the frame,in the encoderlayer,the padding you set is kernel-size //2, while you set padding equals to kernel-1,I wonder that why???What's the reason???I means that why they don't stay the same??

Peter-1213 commented 2 years ago

You mean padding is not the same in encoder/decoder? That's because you have to avoid leakage of information from the future tokens in decoders. So by padding all empty tokens to the left, the conv op won't have access to the future tokens.

Michaelvll commented 2 years ago

Thanks for replying @Peter-1213. I will close this staled issue.