graphdeeplearning / graphtransformer

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.
https://arxiv.org/abs/2012.09699
MIT License
889 stars 137 forks source link

About Equations 11~12 #7

Closed ZikangZhou closed 3 years ago

ZikangZhou commented 3 years ago

Hi,

Great work!

I want to confirm whether my understanding of equations 11~12 is correct.

I understand equation 12 in this way: (Q h_i * K h_j / sqrt(d_k)) is a scalar, and (E e_ij) is a d_k-dim vector. Then a scalar multiplying a vector gives a d_k-dim vector. In equation 11, this d_k-dim vector is transformed to a scalar by computing w_1+w_2+...+w_dk. Is it correct?

vijaydwivedi75 commented 3 years ago

Hi @ZikangZhou, Thanks for your query. Actually, In Eqn. 12 the computations are elementwise, (unlike that of Eqn. 5). To continue to your answer please refer this issue, that includes pointers to corresponding code implementations. Best Regards.

ZikangZhou commented 3 years ago

Thanks for your reply. So you mean in edge transformer, all product operations are element-wise instead of dot product? That sounds interesting.

vijaydwivedi75 commented 3 years ago

As mentioned in the above referenced issue, the process of injecting available edge features is chosen to be feature-dimension wise in Graph Transformer Layer with edge features; thus the elementwise computations in Eqn. 12.

However, For Eqn. 11 and then forward to Eqn 9, they are scalar values, as mentioned in detail in the referenced issue:

In Eqn.11 the features of \hat{w}_{i, j} are summed across the d-dimensions to obtain scalars.

ZikangZhou commented 3 years ago

Thanks for clarifying, I somehow understand the intuition behind it.