Open Emrys365 opened 1 year ago
Hi, I am curious about the importance of the proposed improved Transformer layer compared to the standard one (w/o the positional encoding). But I couldn't find the related information in the paper.
I think I have the answer now. I tried to replace the RNN with a feedforward layer, and it seems to converge very slow.
Hi, I am curious about the importance of the proposed improved Transformer layer compared to the standard one (w/o the positional encoding). But I couldn't find the related information in the paper.