kronenthaler / libai

4 stars 2 forks source link

Random matrix initialization in MLP #21

Closed dktcoding closed 7 years ago

dktcoding commented 7 years ago

I was wondering why do Matrix#fill() need to use very small random values? (I'm sure there's a reason)

The thing is, in MLP this really small values can be too small... I mean, sufficiently close to zero to be almost "untrainable".

This screenshots illustrate the problem, the first one is training an MLP to learn a sin(x) using numbers that are very close to zero. The third one is using (not the optimal for each distribution) simply Math.random() - 1.

Needless to say even with veeery small initial values it ends up converging to a solution (when there's no noise like in those pictures), of course I understand that this is simply one example, that could be considered a "corner case".

My idea (it's actually something that I'm using right now) is to create another constructor for MLP that accepts a Function, and replace Matrix#fill() with Matrix#apply() in case the function is not null (or create a default Function for that case).

I'm not entirely sure it's the best approach, perhaps using a Random object and an overridden version of Random#nextDouble() is better, or any other alternative.

What do you think?

kronenthaler commented 7 years ago

The reason is not related to MLP in particular, but more to other networks. RFB and Hebb, if i remember correctly, are more sensible to "big" values, and they tend to work better when values are 0 or almost 0. I checked the implementation of the fill method and i'm multiplying the nextDouble value by 0.01, basically to make them smaller. I would experiment first removing the 0.01 constant of the mix, and see if the results improve or not. If there is an improvement, we can introduce the parameter you mentioned in the exceptional cases (Hebb and RBF), maybe doing some testing as you just did. If it's no really necessary, and everything works better with a random initialization, i don't see a need to add extra parameters for the initialization. In theory the NN should be resilient to high values in general, but as you detected, smaller values can cause numerical problems.

dktcoding commented 7 years ago

I already tried removing the 0.01 from fill and it converged faster (the last picture), of course I didn't want to change it in case it breaks other things.

If I have some time later today I'll implement the change, honestly I much rather do it over the MLPs, particularly to provide the ability to use different random distributions, that being said, we could actually change both.

dktcoding commented 7 years ago

There's the patch for Matrix#fill(), we should check exactly which of the algorithms are the ones that need small random numbers for initialization, so as to fix them.