Open HaiboShi opened 8 years ago
I'm not sure if I understand your question correctly. You can apply any kind of optimizer (e.g., SGD, RMSProp) given the derivative of the weights (diff). If you define a LSTM layer in prototxt and use a solver from Caffe, you don't have to worry about it because the solver automatically updates the weights of LSTM as it does forward/backward/update.
Hi, I'd like to know if we do the lstm in N batch and obtain the diff from them, how to update the weight for that layer? thanks.