laiguokun / Funnel-Transformer

MIT License
212 stars 17 forks source link

Question related to pretraining with residual connection #10

Open hxu38691 opened 3 years ago

hxu38691 commented 3 years ago

Hi,

I wonder, since the proposed one-step decoder uses full-length encoded representation as well,

is it possible that the decoder will see the unpooled representation having more representation power rather than the pooled one, thus causes less dependency on the pooling operation to achieve good result for pretraining(correct me if i’m wrong)? If so, how to ensure the effectness of encoded representation if used in downstream tasks?

Thanks