Closed DevinKreuzer closed 3 years ago
Hi @DevinKreuzer, glad and thanks for your question. We follow the DGL implementation with builtin funcs as described in detail here.
The 'z' is part of the softmax, implemented in this fashion.
Hope the article referenced makes clear and leaves no inconsistency as compared to the equations in the paper.
Cheers, Vijay
Closing the issue! Feel free to open in case of further questions.
Hi, thanks for the great paper :)
I was just curious as to what the 'z' variable is in line 59 of the graph_transformer_layer.py code? I cannot seem to find the equivalent in the paper. It seems you are normalizing the output heads by the sum of the attention weights?
Would appreciate a little point :)
Thanks, Devin