jeffheaton / encog-java-core

http://www.heatonresearch.com/encog
Other
743 stars 268 forks source link

Implement regularization #28

Closed PetrToman closed 7 years ago

PetrToman commented 12 years ago

Hello, please consider implementing regularization, as it is essential to deal with the overfitting problem.

I recommend watching 12 min. video "Regularization and Bias/Variance" of lesson X. ADVICE FOR APPLYING MACHINE LEARNING at https://class.coursera.org/ml/lecture/preview (Stanford ML course).

It would also be useful to enhance Encog Analyst - it could split data into 3 sets (training, cross validation, testing) and try to find the optimal regularization parameter automatically.

seemasingh commented 12 years ago

I will take a look at this to see if we will include in Encog 3.1 or 3.2. I am in the process of finalizing features for 3.1, as we want to release it soon and move to a code freeze. Defiantly an important feature, though. Encog currently has two methods to combat overfitting. Crossvalidation and Early Stopping(new for 3.1). More info here, though these wiki pages are in need of expansion.

http://www.heatonresearch.com/wiki/Overfitting

PetrToman commented 12 years ago

Good! Early Stopping may be useful too, but regularization should be more powerful. Basic implementation (not regarding Workbench) shouldn't be much of a problem, as the regularization term is applied after the gradients are computed.

ghost commented 12 years ago

I wrote a piece of code in Java for the regularization: I implemented a Strategy to do so. I only tested it for ResilientPropagation. Feel free to use it and make remarks.

public class RegularizationStrategy implements Strategy {

    private double lambda; // Weight decay
    private MLTrain train;
    private double[] weights;

    public RegularizationStrategy(double lambda) {
        this.lambda = lambda;
    }

    @Override
    public void init(MLTrain train) {
        this.train = train;
    }

    @Override
    public void preIteration() {
        try {
            weights = ((Propagation) train).getFlatTraining()
                    .getNetwork().getWeights();
        } catch (Exception e) {
            weights = null;
        }
    }

    @Override
    public void postIteration() {
        if (weights != null) {
            double[] newWeights = ((Propagation) train).getFlatTraining()
                    .getNetwork().getWeights();
            for (int i = 0; i < newWeights.length; i++) {
                newWeights[i] -= lambda * weights[i];
            }
            ((Propagation) train).getFlatTraining()
            .getNetwork().setWeights(newWeights);
        } else {
            System.err.println("Error in RegularizationStrategy, weights are null but should not be.");
        }
    }

}
PetrToman commented 12 years ago

poussevinm: I like the idea of implementing it as a Strategy. As for the regularization, I think the old values are not needed, so if I'm not mistaken (I haven't tested it), the above code can be simplified to:

public void postIteration() {
    double[] weights = ((Propagation) train).getFlatTraining().
                       getNetwork().getWeights();

    for (int i = 0; i < weights.length; i++) {
        weights[i] += lambda * weights[i];   // also using +
    }
}

In Encog 3.1 the weights are copied to GradientWorkers before postIteration() is called (see Propagation.iteration()), so I guess this code wouldn't work. I suggest to introduce a new Strategy method, something like public void postGradient(), to resolve this.

ghost commented 12 years ago

My idea was that regularization is adding a term to the cost function and as the gradient is linear, you can process the influence of regularization in a second time. So I took the initial weights before the modification by the part of the gradient that is processed with learning examples and let this part of the gradient do its work. Once it is done, i simply added the gradient of the regularization term.

This is why i needed initial weights. This also means that the code does not depend on the way you process the gradient on training examples.

PetrToman commented 12 years ago

Well, the problem is that weights = newWeights in postIteration(), because the array is not cloned, but assigned by a reference (try printing out values).

jeffheaton commented 12 years ago

Thanks for the contributed code, I will take a look.

ghost commented 12 years ago

I see your point Petr. This is why I used the setWeights(double[]) method in my postIteration() method. ((Propagation) train).getFlatTraining().getNetwork().setWeights(newWeights);

Thanks for your attention in my code. Do you want me to comment/document it ?

PetrToman commented 12 years ago

My point was, that weights actually don't keep the old values. Take a look at Jeff's code (https://github.com/encog/encog-java-core/commit/1aa783de1895d4aee46096e59426b5acc0076ccd), I think this is the way you meant to implement it.

ghost commented 12 years ago

Ok, my bad. I see my mistake now. Thanks.

jeffheaton commented 12 years ago

Okay I implemented this, with the code fix, in Encog 3.2. Have not played with it much yet. Also added issues #96 and #97 to make this easily used in the workbench.

thomasj02 commented 12 years ago

Actually I think this code is still incorrect. You don't want to regularize weights from bias inputs. It's not clear to me though how to figure out if a weight is from a bias input when you have the flat representation.

joetanto commented 9 years ago

I agree with @thomasj02, the bias terms must not be included when regularizing. I'd appreciate if someone can fix that. Thank you.

vincenzodentamaro commented 9 years ago

Well, allowing large biases gives our networks more flexibility in behaviour: large biases make it easier for neurons to saturate, which is sometimes desirable. So regularization on biases is not necessary.

jeffheaton commented 7 years ago

Since this was submitted, Encog has added dropout, L1 and L2.