Hi, @weiyinwei , your paper said 'optimizing the model with stochastic gradient descent (SGD)', but in the code, the optimizer is Adam, which is the right one you used to get the best result?
Actually, for our model, SGD could achieve the best performance, but it costs too much to tune the model. In contrast, Adam gets a comparable result in a short period.
Hi, @weiyinwei , your paper said 'optimizing the model with stochastic gradient descent (SGD)', but in the code, the optimizer is Adam, which is the right one you used to get the best result?