Open PANXiao1994 opened 5 years ago
Hi @PANXiao1994 , we have tried putting the projection after the highway network. However, we found it to be overfitting severely and it decreased the performance. If anyone else finds results different to what I observed, please let me know.
Hi, I have noticed that you have put the input projection before Highway Network. However, in the paper, it is mentioned that the input of Embedding Encoding Layer is a vector of dimension p1+p2=500 for each word, which means that the projection is placed after the Highway Network.
Have you already try this?