graphdeeplearning / graphtransformer

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.
https://arxiv.org/abs/2012.09699
MIT License
872 stars 134 forks source link

Why did you divide this term? #20

Open sperfu opened 2 years ago

sperfu commented 2 years ago

Hi there,

I was reading your code on graphtransformer, I'm kind of curious on the operation shown below. Why did you divide the wV score by the w(or so called 'score' term), I didn't see any terms shown in your equation 4 or equation 9 in the paper. Could you illustrated that? https://github.com/graphdeeplearning/graphtransformer/blob/c9cd49368eed4507f9ae92a137d90a7a9d7efc3a/layers/graph_transformer_edge_layer.py#L112

Thanks

vijaydwivedi75 commented 2 years ago

Hi @sperfu, it is part of the softmax term. Please refer to this issue for the pointers to the explanation.