graphdeeplearning / graphtransformer

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.
https://arxiv.org/abs/2012.09699
MIT License
872 stars 134 forks source link

Technical question #1

Closed DevinKreuzer closed 3 years ago

DevinKreuzer commented 3 years ago

Hi, thanks for the great paper :)

I was just curious as to what the 'z' variable is in line 59 of the graph_transformer_layer.py code? I cannot seem to find the equivalent in the paper. It seems you are normalizing the output heads by the sum of the attention weights?

Would appreciate a little point :)

Thanks, Devin

vijaydwivedi75 commented 3 years ago

Hi @DevinKreuzer, glad and thanks for your question. We follow the DGL implementation with builtin funcs as described in detail here.

The 'z' is part of the softmax, implemented in this fashion.

Hope the article referenced makes clear and leaves no inconsistency as compared to the equations in the paper.

Cheers, Vijay

vijaydwivedi75 commented 3 years ago

Closing the issue! Feel free to open in case of further questions.