Closed SUDA-HLT-ywfang closed 7 months ago
Hi, thanks for your interest! We are a bit different from Jacobi decoding. And the number in fugure 5 shows a relative posision (assuming the input is position 0).
Thank you for your reply! I'm still a little bit confused.
Hi,
In Figure 5, position 6 in red actually attends to position 5 in green (as the red arrow), instead of position 5 in red (as the green arrow). Why is that, considering that position 5 in red is the latest iteration result? So, you can get a more accurate trajectory by attention like this?
Hi @FrankCast1e , my idea is that the red 6 is generated by the sequence: some 3, orange 4, green 5. This makes a strong local relation if these 3,4,5 tokens can form an n-gram phase. In this turn, we can use orange 4, green 5, and red 6 to generate the next token to form another meaningful n-gram. If you use red 5 as the previous token of red 6, I think it does not make much sense as the red 6 has no relationship with red5, and it may not generate a meaningful n-gram. And, if you change the last token of red 6, which will be the last token of red 5? I think it should be carefully investigated and form another substitute solution.
Thank you very much for your explanation! I totally get the idea right now.
Hi! In Figure 5 of the blog, it seems like tokens of the current iteration attend to tokens from previous iterations. For example, the token at position 6 in red attends to token at position 5 in green. But in Jacobi decoding, is it supposed to attend to tokens from the current iteration? That is: the token at position 6 in red attends to token at position 5 in red.