Closed Spnetic-5 closed 1 year ago
Thanks @Spnetic-5, looks like a good start. You already have the pure SGD example. Do you need any help going forward? To allow batch and mini-batch GDs, I suggest defining x
and y
data as 1-d arrays that will be your entire dataset. Then for SGD, feed x
and y
elements one at a time, for mini-batch subset multiple batches, and for batch GD pass the entire arrays.
Thanks @Spnetic-5, looks like a good start. You already have the pure SGD example. Do you need any help going forward? To allow batch and mini-batch GDs, I suggest defining
x
andy
data as 1-d arrays that will be your entire dataset. Then for SGD, feedx
andy
elements one at a time, for mini-batch subset multiple batches, and for batch GD pass the entire arrays.
Sure, Thank you for suggesting the approach of using 1-dimensional arrays for the dataset. I'm working on the optimizer code, I'll update the changes soon.
Thanks @Spnetic-5 for the work so far. Please study the changes in https://github.com/modern-fortran/neural-fortran/pull/134/commits/bda1968f70d0cbf03bb275cb0bbb043f74d3b102. There were a few important fixes to the code:
train_size
parameter which I use to allocate the training x
and y
.net % forward
and net % backward
methods needed one sample at a time as inputs rather than whole arrays; this is something we can improve later by allowing to pass a batch of data at once.ypred
for each optimization method.I don't know if the results are correct yet, but the code compiles and produces lower errors with increasing epoch count. On my computer the minibatch GD produces very different results between debug and release profiles, so something still not quite correct there.
We're getting close!
You're welcome! I apologize for the errors and the quality of the code pushed earlier. Thank you for pointing out the changes and fixes you made to the code. It's good to hear that the code is now compiling and producing lower errors with increasing epoch count.
I have carefully studied the changes you made, and I understand the modifications you've introduced. It's great to see that the code now compiles without errors and produces lower errors with increasing epoch count. I will continue to review the code and evaluate the results to ensure correctness.
In order to identify the underlying cause and rectify the issue, I will investigate the discrepancies in the minibatch GD results.
These are the results on my PC :
For 1000 epochs:
Stochastic gradient descent MSE: 0.001104
Batch gradient descent MSE: 0.062504
Minibatch gradient descent MSE: 0.088675
For 5000 epochs:
Stochastic gradient descent MSE: 0.000449
Batch gradient descent MSE: 0.071504
Minibatch gradient descent MSE: 0.000996
Here, BatchGD is showing a slight increase in MSE, I think as it updates the weights using the entire training dataset in each epoch. As the number of epochs increases, the model starts overfitting the train data, which leads to a higher MSE on test data.
@Spnetic-5 in SGD subroutine, can you shuffle the mini-batches so that it's truly stochastic? Currently it loops over the mini-batches in the same order every time. Here's my suggested approach:
The outcome should be that in each epoch the order of mini-batches is random and different. You can take inspiration from
but even there the mini-batches are not truly shuffled, but rather the start index is randomly selected so that in each epoch there are some data samples that may be unused and there are some that are used more than once.
@milancurcic I have updated weekly progress on discourse, should I now proceed to next RMSProp
or Adam
optimizer, or there are any more changes required in current optimizers
Thank you @Spnetic-5 for this PR. Currently this optimizer is only implemented in an example and is not available to other users through the library. Therefore my advice would be to look how to integrate this optimizer in the library. @milancurcic what should be the next step?
Solving #133 @milancurcic
Optimizers to be implemented: