Open wangjs9 opened 4 years ago
In the paper attention is all you need, the kernel size of convolutions is set to 1. But I find in this implement, this value is 3. Therefore, I am asking the reason.
In the paper attention is all you need, the kernel size of convolutions is set to 1. But I find in this implement, this value is 3. Therefore, I am asking the reason.