I have a question about the learning rate. As is stated in the appendix of GraphSAGE, and are adopted by many other works, the learning rate is usually set to 1e-2, etc. Meanwhile, they usually normalize the input features.
However, in your work, the learning rate is set to 0.7, which is surprisingly high. You do not normalize the input features either. When I try to reset the learning rate to a common one and use the normalized features to train, I find that the model could only converge to a extremely bad performance.
This issue confuses me a lot. Could you help explain a bit?
Hello!
I have a question about the learning rate. As is stated in the appendix of GraphSAGE, and are adopted by many other works, the learning rate is usually set to 1e-2, etc. Meanwhile, they usually normalize the input features.
However, in your work, the learning rate is set to 0.7, which is surprisingly high. You do not normalize the input features either. When I try to reset the learning rate to a common one and use the normalized features to train, I find that the model could only converge to a extremely bad performance.
This issue confuses me a lot. Could you help explain a bit?