[Question] Did you evaluate the performance gain of using the improved Transformer layer instead of the standard Transformer layer?

ujscjj / DPTNet

100 stars 23 forks source link

[Question] Did you evaluate the performance gain of using the improved Transformer layer instead of the standard Transformer layer? #8

Open Emrys365 opened 1 year ago

Emrys365 commented 1 year ago

Hi, I am curious about the importance of the proposed improved Transformer layer compared to the standard one (w/o the positional encoding). But I couldn't find the related information in the paper.

Emrys365 commented 1 year ago

I think I have the answer now. I tried to replace the RNN with a feedforward layer, and it seems to converge very slow.