jeffheaton / encog-java-core

http://www.heatonresearch.com/encog
Other
744 stars 268 forks source link

PNN Training bugs #188

Open Dennis1111 opened 9 years ago

Dennis1111 commented 9 years ago

Hi, Feel a little bit disappointed over feedback in the forums but hoping for better success here instead. Think I have found a couple of PNN learning realated bugs.

The first one is that TrainBasicPNN favours overfitting (sigma <-- 0). This happens because the leave one out pattern algoritm leaves the wrong pattern out. (Printing the exclude vs inputpattern in BasicPNN showed that the excluded pattern didn't match the inputpattern) So in TrainBasicPnn.computeDeriv(..).

I change the exclude part from if (r == this.network.getExclude()) to int nrOfSamples = (int)this.network.getSamples().getRecordCount(); int exclude = nrOfSamples - 1 - this.network.getExclude();
if (r == exclude)

and similar change in BasicPNN.compute(..) int nrOfSamples = this.samples.size(); if (r == nrOfSamples-1-getExclude())

The second bug is that the learnining algorithm doesn't compute different sigma's for each input variable. DeriveMinimum that uses GlobalMinimumSearch is supposed to do this work but GlobalMinimumSearch is hardcoded to call TrainBasicPNN.calcErrorWithSingleSigma(sigma). I guess there was some future plan for GlobalMinimum being able to call calcErrorMultiSigma(..) directly also since CalculationCritera (the interface argument supplied) contains both methods !?

Started comparing with Timothy Masters source code and decided to follow that implementation where when DeriveMinimum supplies the 'CalculationCritera' it expands to calling TrainBasicPnn.calcErrorMultiSigma(..). I like this solution because it keeps GlobalMinumSearch at a minimum :). Have created an Interface CalcError as argument to globalMin instead which just has the method double calcError(double x); So I let DeriveMinimum implement CalcError with the code

private void updateX(double[] x,double[] base,double t,double[] direction,double min) { for (int i=0 ; i<n ; i++) { x[i] = base[i] + t * direction[i]; if (x[i] < min) [i] = min ; } }

public double calcError(double t) { updateX(this.x,this.base,t,this.direction,MIN_SIGMA);
return network.calcErrorWithMultipleSigma(this.x, null , null , false);
}

In TrainBasicPNN calcError uses calcErrorWithSingleSigma(t) instead

Found a few other bugs in DeriveMinimum also. Line 166 : x[i] = base[i] + globalMinimum.getY2() * direc[i]; should be x[i] = base[i] + globalMinimum.getX2() * direc[i];

Updating the derivatives network.calcErrorWithMultipleSigma(x, direc, deriv2, true); is missing before the code

for (int i = 0; i < n; i++) { direc[i] = -direc[i]; // negative gradient }

Greetings Dennis

jeffheaton commented 9 years ago

I do not use PNN a great deal. But I had noticed that Encog's PNN does have a tendancy to overfit. Thank you for the suggestions on a fix. I will include this with the next version of Encog 3.4.

Dennis1111 commented 9 years ago

I'm glad if my debugging can come others to use. I believe some would appreciate the multiple sigma fix also. Would mailing/uploading my edited source code somehow be useful ? Before debugging I upgraded to version 3.2.0. Thaught that was latest version because that's what I found on https://code.google.com/p/encog-java/downloads/list .

Sukhumarn commented 9 years ago

Hi Dennis111,

I have an over fitting problem when using pnn. I have tried to debug as you explained, but it doesn't work.

Would you mind send me the debugged source code of pnn Thank you very much

Dennis1111 commented 9 years ago

Hi,

Apologies have changed this comment , seems I was wrong about some details.

Are you sure that the overfitting isn't simple because your dataset is to small (overfitting doesn't has to mean there is a bug) ? When I tested the PNN on the XOR problem with my changes I thaught there was still overfitting but when I think of have PNN works it's probably not a good example for testing the capabilities of PNN's.

Greetings Dennis

anilarao commented 9 years ago

Hi Dennis1111, It would be great if you can upload your edited source code Thanks

Dennis1111 commented 9 years ago

Hi Anilarao, To upload the code I probably need to read the apache 2.0 license carefully first and then make some changes (documentation) according to the license so I feel like that's more than I want to do right now. Greetings Dennis

Mistuus commented 8 years ago

Hi @jeffheaton,

Encog is a great tool! Thanks so much for working on it.

Do you have any updates or code that can help with the overfit problem? I am using PNN for my final year project and solving this would be very useful.

Thanks, Victor

EdWood1994 commented 7 years ago

Hey Dennis, can you pls share your code? It doesn't conflict with apache license. It has been two years and nothing has change in Encog.

Dennis1111 commented 7 years ago

Hi EdWood. I believe the overfitting bug has been corrected in encog but not the other stuff I wrote about. Have decided to upload my code and perhaps some extra pnn related code I have done. Feel like I should "clean" up the code a little first and makes some notes about it, so I will make a notification within 2 weeks. Meanwhile I like to share some thaughts on PNN and overfitting.

  1. PNN is sensitive to the curse of dimensionality. If overfitting is your problem try to reduce the dimensions starting with only the most relevant inputs or use a dimension reduction method as Principal Component Analysis.

Greetings, Dennis

EdWood1994 commented 7 years ago

Thank you. I use it for regression and classification. I always get low sigma values like 0.001. I looked close to the source and found this method globalMinimum.findBestRange(...) which is called in iteration() method in TrainBasicPNN class. It should find the best sigma values by brute force. If I am not wrong, Encog tries different sigma values for training dataset and this is why sigma 0.001 can be enough for training dataset, but definitely not for others dataset. Shoudn't try different sigma on different dataset? Some kind of validation set? Because from nature of PNN it should always get 0 MSE error on training dataset, except some high sigmas. So it gets 0 error on sigma 0.001 and can't beat 0 error with different sigma.

Also there is a question, why Encog uses as default method for training findBestRange except Brent Method, which is already implemented? I still learn how PNN works, so maybe I miss something.

Dennis1111 commented 7 years ago

Sounds as the exclude bug is still around, Personally I have replaced the exclude code with if (isSamePattern(pair.getInput(), input)) { continue; } in both BasicPNN and trainBasicPNN and the isSamePatternCode is public static boolean isSamePattern(MLData pattern1, MLData pattern2) { int length = pattern1.getData().length; for (int i = 0; i < length; i++) { if (pattern1.getData()[i] != pattern2.getData()[i]) return false; } return true; } You can try that for a quickfix. Maybe not the best solution but an idea is that threaded training now also becomes easier to implement.

I will be back later with a modified GlobalMinimum,DeriveMinium,TrainBasicPNN findBestRange is ment to be useful as a crude search when we start with random weights while brentMin chooses searchpoints more intelligently when are getting closer to a minima (as I remember it). Since training time increases exponential with the size of the dataset this becomes important for large datasets.

EdWood1994 commented 7 years ago

You are absolutely right, now findBestRange works, but it's time expensive with large datasets how you mentioned, so I am going to try brentMin. Thank you for help.

Dennis1111 commented 7 years ago

Wan't to remind you also that GlobalMinimumSearch is hardcoded to call network.calcErrorWithSingleSigma(x); which means you can only generate solutions where all sigma has the same value (one sigma solution). Learning multivariate problems should be through DeriveMinimum but then the best search point x GlobalMinimum finds should be used in the formula new sigmas= current sigma + x gradients. So x shouldn't be a sigma value here atleast. I think it's better to wait a little for my updates. The way it works in Timothy Masters source code is when GlobalMinumum is called from DeriveMinimum then globalMinumum has a pointer back to DeriveMinimum. An evaluation of searchpoint x then expands to an evaluation of the network with (current sigma + x gradients).

jeffheaton commented 7 years ago

Thanks for posting the code above. I am in the process of getting things together for an Encog 3.4 release. I would like to address this issue as part of it. @Dennis1111, do you have any code beyond what is above that you would like to share? I would be glad to include and credit you. I am going to go through what you have above soon.

I am going to try aggregating the suggestions that I see above and add them to the current code and see what it does for a prediction accuracy for this type of neural network before/after.

Dennis1111 commented 7 years ago

Hi, I will upload code and some demo example on github and come back with link on monday or tuesday.

Greetings Dennis

jeffheaton commented 7 years ago

Thanks Dennis, that would be great!

Dennis1111 commented 7 years ago

Hi , Have now uploaded some code on https://github.com/Dennis1111/encogPNNDemo. Have put some comments about my code in the README file. Made a demo for the classical Iris and wanted to test some more complex problem so choosed the first available on UCI machine learning datasets (https://archive.ics.uci.edu/ml/datasets/Abalone). I'm not sure if there perhaps is some bug with the derivatives so I have also tried a GeneticAlgoritm approach included in demo. One would normally expect that using separate classes would give lower error though maybe overfitting, however I got worse error with separate classes. Perhaps I have just choosen bad classification problems for this task. When learning the Iris problem I get some suprising/contradictory result. After single sigma solution we have 5 of 150 misclassification, after DeriveMinimum the error goes down a little but now we have 6 misclassifications and finally with GA 3 misclassifications is achieved. I hope you'll find something useful in my code, let me know if you have any further thaughts/questions ?

jeffheaton commented 7 years ago

Thank you very much, that is very helpful. I will have a look at it and see about getting it integrated with Encog. I will also likely port to C# Encog.

jeffheaton commented 7 years ago

Thanks, I will pull some of the code into the Encog classes. Some of the PNN classes use Swing elements directly, so I will keep that out of Encog core. Very interesting approach with TinyGP to check the deriatives. Finite difference might be a good approach too. But this is very helpful and I will get it integrated with Encog, it looks like it will be a step forward from the current PNN's.

Dennis1111 commented 7 years ago

I'm glad that I can contribute to Encog. Thanks for the finite difference hint. Will be looking forward to see Encog 3.4 !