Closed DevinKreuzer closed 3 years ago
Hi @DevinKreuzer,
The .sum()
is done here in graph_transformer_layer.py.
https://github.com/graphdeeplearning/graphtransformer/blob/3c83b4ba5e45a2e25bbefde1b35d88a27ca3cfb2/layers/graph_transformer_layer.py#L18-L19
@DevinKreuzer: Shouldn't the attention weights/scores be scalars? From what I see, each head has an 8-dimensional score vector
In graph_transformer_edge_layer.py, the process of injecting available edge features is chosen to be feature-dimension wise, i.e. implicit attention scores (per feature dimension) is multiplied with available edge features (per feature dimension), in Eqn. 12 of the paper, and implemented as: https://github.com/graphdeeplearning/graphtransformer/blob/3c83b4ba5e45a2e25bbefde1b35d88a27ca3cfb2/layers/graph_transformer_edge_layer.py#L33-L34
Eqn. 12 outputs a d-dim feature vector (say d is the feature dimension). This d-dim edge feature vector is critical since its passed to the edge feature pipeline (to be maintained at every layer), starting from Eqn. 10, towards Eqns. 16-18 in the paper. In Eqn.11 the features of \hat{w}_{i, j} are summed across the d-dimensions to obtain scalars, which is the .sum()
that you mention in your query.
Hope this helps for understanding the implementation. Vijay
Closing the issue for now. Feel free to open for any (further) clarification.
Great work!
I have a question concerning the implementation of softmax in the graph_transformer_edge_layer.py
When you define the softmax, you use the following function:
Shouldn't the attention weights/scores be scalars? From what I see, each head has an 8-dimensional score vector which you then compute .sum() on. The graph_transformer_layer.py layer does not have this .sum() function.
Would appreciate any clarification on this :)
Best, Devin