Closed agave233 closed 4 years ago
Well, thanks for your attention on our work, and you're right. Actually, W_T, W_C, W_F can be three different weight matrices or three shared weight matrices. And we choose the latter one in our implementation. Maybe you've read an old version of our paper, and we have already changed it in the new version attached in our github.Here
Well, thanks for your attention on our work, and you're right. Actually, W_T, W_C, W_F can be three different weight matrices or three shared weight matrices. And we choose the latter one in our implementation. Maybe you've read an old version of our paper, and we have already changed it in the new version attached in our github.Here
Oh, I see. Thanks! By the way, is the latter(using shared weight matrices) performs better?
I'm sorry that we didn't specifically compare the performance for these two mechanisms, so I'm not sure about that. Maybe you could have a try when you design your attention mechanism. :)
Hi, Thanks for your excellent work. I found the implementation of attention in your code is a little different from equation 8. In the paper, Z_T, Z_C, Z_F use three different transformation weight matrices. But it seems that Z_T, Z_C, Z_F (e.g., emb1, emb2, Xcom) share the same transformation weight matrix in the code. I am confused about this.