Why TimeDistributed layer before attention layer?

richliao / textClassifier

Text classifier for Hierarchical Attention Networks for Document Classification

Apache License 2.0

1.07k stars 379 forks source link

Why TimeDistributed layer before attention layer? #3

Closed hdchao1989 closed 7 years ago

hdchao1989 commented 7 years ago

In HATT model, TimeDistributed layer has been adopted before both word-level and sentence-level attention layer. I'm confused what does TimeDistributed layer do in here? And in RNN model, there is no TimeDistributed layer before attention layer. What's the difference? Thank U!

richliao commented 7 years ago

Read the paper. There's a dense layer that sit on top of RNN, which tries to combine another dimension information from RNN. Honestly I don't know if it's useful. I bet you won't see performance decrease by reducing this step.

hdchao1989 commented 7 years ago

I see, thx~