Closed yuanx749 closed 4 years ago
The last layer typically does not contain an activation function since you do not want to restrict the output of your model, e.g., for regression.
For inference, we are only interested in the argmax
of the output, and this does not get changed by the log_softmax
call, so this operation is not really of need here.
Thank you, you're right. I was thinking in unsupervised learning like Deep Graph Infomax, whether activation function should be used in the last layer to get the node representation. I will try both to see which gets better performance.
❓ Questions & Help
Hi, In examples/reddit.py (line 61), why is relu not applied to x in the last layer in the inference method? Shouldn't activation function be used at each layer?