Closed PetrToman closed 7 years ago
I will take a look at this to see if we will include in Encog 3.1 or 3.2. I am in the process of finalizing features for 3.1, as we want to release it soon and move to a code freeze. Defiantly an important feature, though. Encog currently has two methods to combat overfitting. Crossvalidation and Early Stopping(new for 3.1). More info here, though these wiki pages are in need of expansion.
Good! Early Stopping may be useful too, but regularization should be more powerful. Basic implementation (not regarding Workbench) shouldn't be much of a problem, as the regularization term is applied after the gradients are computed.
I wrote a piece of code in Java for the regularization: I implemented a Strategy to do so. I only tested it for ResilientPropagation. Feel free to use it and make remarks.
public class RegularizationStrategy implements Strategy {
private double lambda; // Weight decay
private MLTrain train;
private double[] weights;
public RegularizationStrategy(double lambda) {
this.lambda = lambda;
}
@Override
public void init(MLTrain train) {
this.train = train;
}
@Override
public void preIteration() {
try {
weights = ((Propagation) train).getFlatTraining()
.getNetwork().getWeights();
} catch (Exception e) {
weights = null;
}
}
@Override
public void postIteration() {
if (weights != null) {
double[] newWeights = ((Propagation) train).getFlatTraining()
.getNetwork().getWeights();
for (int i = 0; i < newWeights.length; i++) {
newWeights[i] -= lambda * weights[i];
}
((Propagation) train).getFlatTraining()
.getNetwork().setWeights(newWeights);
} else {
System.err.println("Error in RegularizationStrategy, weights are null but should not be.");
}
}
}
poussevinm: I like the idea of implementing it as a Strategy. As for the regularization, I think the old values are not needed, so if I'm not mistaken (I haven't tested it), the above code can be simplified to:
public void postIteration() {
double[] weights = ((Propagation) train).getFlatTraining().
getNetwork().getWeights();
for (int i = 0; i < weights.length; i++) {
weights[i] += lambda * weights[i]; // also using +
}
}
In Encog 3.1 the weights are copied to GradientWorkers before postIteration()
is called (see Propagation.iteration()
), so I guess this code wouldn't work. I suggest to introduce a new Strategy
method, something like public void postGradient()
, to resolve this.
My idea was that regularization is adding a term to the cost function and as the gradient is linear, you can process the influence of regularization in a second time. So I took the initial weights before the modification by the part of the gradient that is processed with learning examples and let this part of the gradient do its work. Once it is done, i simply added the gradient of the regularization term.
This is why i needed initial weights. This also means that the code does not depend on the way you process the gradient on training examples.
Well, the problem is that weights
= newWeights
in postIteration()
, because the array is not cloned, but assigned by a reference (try printing out values).
Thanks for the contributed code, I will take a look.
I see your point Petr. This is why I used the setWeights(double[])
method in my postIteration()
method.
((Propagation) train).getFlatTraining().getNetwork().setWeights(newWeights);
Thanks for your attention in my code. Do you want me to comment/document it ?
My point was, that weights
actually don't keep the old values. Take a look at Jeff's code (https://github.com/encog/encog-java-core/commit/1aa783de1895d4aee46096e59426b5acc0076ccd), I think this is the way you meant to implement it.
Ok, my bad. I see my mistake now. Thanks.
Okay I implemented this, with the code fix, in Encog 3.2. Have not played with it much yet. Also added issues #96 and #97 to make this easily used in the workbench.
Actually I think this code is still incorrect. You don't want to regularize weights from bias inputs. It's not clear to me though how to figure out if a weight is from a bias input when you have the flat representation.
I agree with @thomasj02, the bias terms must not be included when regularizing. I'd appreciate if someone can fix that. Thank you.
Well, allowing large biases gives our networks more flexibility in behaviour: large biases make it easier for neurons to saturate, which is sometimes desirable. So regularization on biases is not necessary.
Since this was submitted, Encog has added dropout, L1 and L2.
Hello, please consider implementing regularization, as it is essential to deal with the overfitting problem.
I recommend watching 12 min. video "Regularization and Bias/Variance" of lesson X. ADVICE FOR APPLYING MACHINE LEARNING at https://class.coursera.org/ml/lecture/preview (Stanford ML course).
It would also be useful to enhance Encog Analyst - it could split data into 3 sets (training, cross validation, testing) and try to find the optimal regularization parameter automatically.