Closed Spnetic-5 closed 1 year ago
Apologies for late reply @milancurcic
@Spnetic-5 I mostly re-wrote the subroutine so that now it compiles and converges. It's not using mini-batching; for simplicity, for now it's being applied after the entire batch of forward and backward passes.
I understand that this PR was challenging. It took me a bit to find the right approach. In your most recent commit, you made some changes and wrote "made suggested corrections", which made it sound like the PR was good to go. However, the example was not even compiling at this stage. Whenever you struggle with the implementation, please write a comment in the PR explaining where you got stuck and if you need help, rather than just leaving it with a short commit message.
Also please study the implementation in this PR. It introduces a new derived type to allow tracking a moving average of gradients over multiple epochs and for each layer. We are likely to use this approach for other optimizers that need a moving average logic.
I apologize for the confusion caused by my commit message. It was not my intention to imply that the code was ready to go. But the code was compiling and running well in my PC, I'll make sure to provide detailed comments in the pull request in the future.
Thanks for the changes, I'll study those.
Thank you, @Spnetic-5, and no worries. I apologize for jumping the gun and finishing the implementation in this PR.
Going forward, would you like to give it a shot to continue the work in #139, or would you like to implement another optimizer in the quadratic fit example program? Recall that once we implement #139 for SGD, the new optimizers in quadratic will serve as prototype implementations for porting them into the library.
Going forward, would you like to give it a shot to continue the work in #139, or would you like to implement another optimizer in the quadratic fit example program? Recall that once we implement #139 for SGD, the new optimizers in quadratic will serve as prototype implementations for porting them into the library.
Thank you, @milancurcic. I would like to work on #137 first. Once we have completed that, we can move on to #139 and then additional new optimizers.
Solves #136 This pull request adds an implementation of the RMSprop optimizer subroutine to the existing quadratic example.
Approach:
rms_weights
andrms_gradients
arrays of appropriate dimensions.rms_weights
andrms_gradients
using the decay rate and current weights/gradients.weights = weights - (learning_rate / sqrt(rms_weights + epsilon)) * gradients
.