modern-fortran / neural-fortran

A parallel framework for deep learning
MIT License
409 stars 85 forks source link

SGD optimizer stub #139

Closed milancurcic closed 1 year ago

milancurcic commented 1 year ago

First attempt at defining the concrete optimizer procedure as a method of SGD optimizer type.

Currently defining the minimize subroutine as elemental to allow a scalar/array/rank-agnostic interface. It's possible that this won't work for all cases if we discover new requirements but let's try this for the time being.

milancurcic commented 1 year ago

This now works. There's an API change to the network % train() and network % update() methods which now require an argument of class(optimizer_base_type). (I wonder if it's possible to make this optional so we can default to sgd).

Once an optimizer is passed to network % update, it's passed to layer % update() for all layers. In layer % update, the weights and biases are accessed from the internal layer representation and passed to optimizer % minimize(). I borrowed the name minimize from Keras. optimizer % optimize() would be appropriate but sounds weird due to repetition. How about optimizer % update()?

The optimizer step for conv2d is currently not implemented but it may be easy to do so even in this PR (but convolutional training is broken anyway as explained in #142).

Spnetic-5 commented 1 year ago

Thanks for bringing up the API change regarding the network methods. Making the optimizer argument optional and setting up SGD as default sounds like a good idea.

Regarding the naming, I think optimizer % minimize() is good, as it captures the essence of the operation. Furthermore, I'll study all the updations in the code.