Closed sanwei111 closed 2 years ago
You mean padding is not the same in encoder/decoder? That's because you have to avoid leakage of information from the future tokens in decoders. So by padding all empty tokens to the left, the conv op won't have access to the future tokens.
Thanks for replying @Peter-1213. I will close this staled issue.
hello,as I see in the frame,in the encoderlayer,the padding you set is kernel-size //2, while you set padding equals to kernel-1,I wonder that why???What's the reason???I means that why they don't stay the same??