Open YangYangTaoTao opened 2 years ago
Hi, thanks for raising the issue. The loop is only conceptual, referring to passing the input through d
identical self-attention blocks ( as implemented here ).
I can see now how this is confusing. Each of the d
blocks are distinct and there is no parameter sharing between these blocks.
This is an amazing job! But I have a question: what does the loop in the diagram mean? In fact, I didn't find the loop operation in the paper and codes. Thanks!