Closed ZikangZhou closed 3 years ago
Hi @ZikangZhou, Thanks for your query. Actually, In Eqn. 12 the computations are elementwise, (unlike that of Eqn. 5). To continue to your answer please refer this issue, that includes pointers to corresponding code implementations. Best Regards.
Thanks for your reply. So you mean in edge transformer, all product operations are element-wise instead of dot product? That sounds interesting.
As mentioned in the above referenced issue, the process of injecting available edge features is chosen to be feature-dimension wise in Graph Transformer Layer with edge features; thus the elementwise computations in Eqn. 12.
However, For Eqn. 11 and then forward to Eqn 9, they are scalar values, as mentioned in detail in the referenced issue:
In Eqn.11 the features of \hat{w}_{i, j} are summed across the d-dimensions to obtain scalars.
Thanks for clarifying, I somehow understand the intuition behind it.
Hi,
Great work!
I want to confirm whether my understanding of equations 11~12 is correct.
I understand equation 12 in this way: (Q h_i * K h_j / sqrt(d_k)) is a scalar, and (E e_ij) is a d_k-dim vector. Then a scalar multiplying a vector gives a d_k-dim vector. In equation 11, this d_k-dim vector is transformed to a scalar by computing w_1+w_2+...+w_dk. Is it correct?