Closed futaoo closed 5 years ago
I'm pretty sure there's a typo in equation 4.
@senarvi thx, I consider the same as you.
I believe Eq 4 is typo. Eq 5 may be typo as well, but could also be misinterpretation of Figure 4. I think you can have a check on the code to figure it out.
Yes! there are small typos as well as a problem in fig4 in the current arXiv version of the paper. We'll update it soon. In the meantime, you can check the slides here and, as always, a better way to understand what's going on exactly is digging into the code :)
@MostafaDehghani Very lucky to have the slides, thanks!
Hi @MostafaDehghani , thank you for the slides! they are really helpful. On a side note, may i inquire if UT and transformer both use the EN-DE default generator provided in the tensor2tensor library? i noticed the version is the same, but i want to be certain.
Yes, we used problem=translate_ende_wmt32k
for all the MT experiments, both with Transformer and Univeral Transformer.
thank you
Thanks @MostafaDehghani and others.
Description
The detailed figure 4 in appendix seems to do not follow the iterative equations (4)(5) in the paper. If I follow the figure, it should be H^t = LayerNorm(A^t+Transition(A^t)), and A^t = LayerNorm(H^(t-1)+P^t+MultiHeadSelfAttention(H^(t-1)+P^t)). It is very confusing. Could anyone help me to figure this doubt out? Thank you!