pemywei / csanmt

This is a code repository for the ACL 2022 paper "Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation"
52 stars 0 forks source link

About $W_r$ in Eqn. 6 #4

Closed Spico197 closed 2 years ago

Spico197 commented 2 years ago

Hi, there. Thanks for the excellent work. I'm wondering the $W_r$ in Eqn. 6. As the paper says, $W_r$ should be a 1-D vector. So what does the $\textnormal{diag}(W_r^2)$ mean? Does it mean $W_r$ is transformed into a matrix (by a vector square operation??), and take the diagonal elements of the matrix as standard deviation?

image

I'd checked the Appendix.E, but really didn't match the codes with the original equations. If it is possible to send a copy of the original code to me (tzhu1997@outlook.com), I would be very grateful, and that detailed implementation will definitely solve my question. Thanks so much if you could answer this question.

pemywei commented 2 years ago

Hi, there. Thanks for the excellent work. I'm wondering the in Eqn. 6. As the paper says, should be a 1-D vector. So what does the mean? Does it mean is transformed into a matrix (by a vector square operation??), and take the diagonal elements of the matrix as standard deviation?

image

I'd checked the Appendix.E, but really didn't match the codes with the original equations. If it is possible to send a copy of the original code to me (tzhu1997@outlook.com), I would be very grateful, and that detailed implementation will definitely solve my question. Thanks so much if you could answer this question.

Hi, there. Thanks for the excellent work. I'm wondering the Wr in Eqn. 6. As the paper says, Wr should be a 1-D vector. So what does the diag(Wr2) mean? Does it mean Wr is transformed into a matrix (by a vector square operation??), and take the diagonal elements of the matrix as standard deviation?

image

I'd checked the Appendix.E, but really didn't match the codes with the original equations. If it is possible to send a copy of the original code to me (tzhu1997@outlook.com), I would be very grateful, and that detailed implementation will definitely solve my question. Thanks so much if you could answer this question.

Hi, thanks for your attention. The operation $diag()$ in Eqn. 6 is definitely the same as $numpy.diag$. Supposing $X = [1,2,3,4,5]$, then $diag(X)=[[1,0,0,0,0],[0,2,0,0,0],[0,0,3,0,0],[0,0,0,4,0],[0,0,0,0,5]]$. We exactly use the diagonal matrix ($diag(W_r^2)$) as the standard deviation of the Gaussian distribution. I have sent a copy of the code to you.

Spico197 commented 2 years ago

Thank you for your response and sharing!