seongjunyun / Graph_Transformer_Networks

Graph Transformer Networks (Authors' PyTorch implementation for the NeurIPS 19 paper)
960 stars 179 forks source link

Difference between code implementation and paper description #22

Closed mmichaelzhang closed 3 years ago

mmichaelzhang commented 4 years ago

Hi,

I found it interesting that, in the paper, it is mentioned that "It is used for node classification on top and two dense layers followed by a softmax layer are used" at the bottom of page 5.

However, the code implementation indicates that only two linear layers with relu nonlinearity instead of two dense layers were used, and the output of the second linear layer is directly compared with the label using cross-entropy. No softmax layer was followed.

X_ = self.linear1(X_)
X_ = F.relu(X_)
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws

I wonder which one I should rely on, the paper description, or the provided code implementation?

seongjunyun commented 3 years ago

Hi,

The paper description and code implementation have no difference. The dense layer and the linear layer have the same meaning. The loss module 'nn.CrossEntropyLoss()' contains a softmax layer.