Doubts on the paper "universal transformers".

tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Apache License 2.0

15.5k stars 3.49k forks source link

Doubts on the paper "universal transformers". #1215

Closed futaoo closed 5 years ago

futaoo commented 5 years ago

Description

The detailed figure 4 in appendix seems to do not follow the iterative equations (4)(5) in the paper. If I follow the figure, it should be H^t = LayerNorm(A^t+Transition(A^t)), and A^t = LayerNorm(H^(t-1)+P^t+MultiHeadSelfAttention(H^(t-1)+P^t)). It is very confusing. Could anyone help me to figure this doubt out? Thank you!

senarvi commented 5 years ago

I'm pretty sure there's a typo in equation 4.

futaoo commented 5 years ago

@senarvi thx, I consider the same as you.

lkluo commented 5 years ago

I believe Eq 4 is typo. Eq 5 may be typo as well, but could also be misinterpretation of Figure 4. I think you can have a check on the code to figure it out.

MostafaDehghani commented 5 years ago

Yes! there are small typos as well as a problem in fig4 in the current arXiv version of the paper. We'll update it soon. In the meantime, you can check the slides here and, as always, a better way to understand what's going on exactly is digging into the code :)

futaoo commented 5 years ago

@MostafaDehghani Very lucky to have the slides, thanks!

colmantse commented 5 years ago

Hi @MostafaDehghani , thank you for the slides! they are really helpful. On a side note, may i inquire if UT and transformer both use the EN-DE default generator provided in the tensor2tensor library? i noticed the version is the same, but i want to be certain.

MostafaDehghani commented 5 years ago

Yes, we used problem=translate_ende_wmt32k for all the MT experiments, both with Transformer and Univeral Transformer.

colmantse commented 5 years ago

thank you

afrozenator commented 5 years ago

Thanks @MostafaDehghani and others.