LBFGS with Neural Networks not iterating correctly?

juicyslew commented 8 years ago

Dear Mr. Khuc,

This is William yet again, My friend and I are entering the later stages of our project and I was trying to make a larger network architecture to allow for more intricate weightings to form. To do this i created two more hidden layers in the network we had, (now it has 3 hidden layers instead of one). However, I am facing an odd problem.

Before implementing the two extra hidden layers, we would have the problem once in a while where the LBFGS function would stop iterating very early (on the 1st to 4th iteration). This is an odd problem specifically because sometimes it happened, but we could try it on the dataset exactly the same and it wouldn't happen. I am not sure what the cause of this problem is. I fixed it temporarily by causing the function to restart if the problem occurred. This way it worked fine, however once I added the two extra layers, this problem started to appear yet again. With the 3 hidden layers, the function only iterates 3 times and maximum. Is there any way to force it to keep iterating, or some other kind of solution to this problem? It is one of the major issues stopping our ability to move forward with our project.

Thank you so much for being so helpful!!

A student and now developer, William Derksen

vinhkhuc commented 8 years ago

Hi William, Based on what you described, there are two issues: 1) The number of iterations that LBFGS takes is not consistent even when you run it on the same data set. 2) LBFGS stops after 3-4 iterations.

I'm not really sure about the issue 1. It seems like there is some randomization happened in the code but it doesn't seem like lbfgs4j uses randomization.

For the issue 2, I guess it is caused by local minimums and/or saddle points that occur in the Neural Network's cost function which is non-convex. LBFGS assumes that the cost function is convex.

I haven't used LBFGS to train the Neural Network but the literature suggests that we need to initialize the weights randomly to get avoid of local minimums and saddle points. SGD does it pretty well.

I think you can try different weight's randomized initial values. Assuming that x is the weight vector, you may need to initialize x0's components randomly so that they are from 0 to 0.01 (such as new Random().nextDouble() * .01). After that, pass x0 to the LbfgsMinimizer as follows:

LbfgsMinimizer minimizer = new LbfgsMinimizer(); double[] x = minimizer.minimize(f, x0);

Note that, you will need to use lbfgs4j newest version (0.2.1) which allows specifying x's initial values x0.

juicyslew commented 8 years ago

But I am initializing the weights randomly and setting them in that way! Let me check real quick.

In order to set them I made a function that checks whether the double[] x is all zeros, and if it is then it sets x to be my theta values

vinhkhuc commented 7 years ago

Looks like the issue got resolved. Feel free to reopen it.

vinhkhuc / lbfgs4j

LBFGS with Neural Networks not iterating correctly? #2