Why did you divide this term?

graphdeeplearning / graphtransformer

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.

https://arxiv.org/abs/2012.09699

MIT License

872 stars 134 forks source link

Open sperfu opened 2 years ago

sperfu commented 2 years ago

Hi there,

I was reading your code on graphtransformer, I'm kind of curious on the operation shown below. Why did you divide the wV score by the w(or so called 'score' term), I didn't see any terms shown in your equation 4 or equation 9 in the paper. Could you illustrated that? https://github.com/graphdeeplearning/graphtransformer/blob/c9cd49368eed4507f9ae92a137d90a7a9d7efc3a/layers/graph_transformer_edge_layer.py#L112

Thanks

vijaydwivedi75 commented 2 years ago

Hi @sperfu, it is part of the softmax term. Please refer to this issue for the pointers to the explanation.