This PR addresses some enhancements that I either wanted to implement since a while or did arise from the meeting today.
Let ES1xLambda look into the future
Previously, the & operator has stopped the optimization process if no better candidate was found in an iteration. This change introduces a deque which saves the last delay elitists, and only performs early stopping if all delay elitists generated afterwards have worse fitness. This should help the optimizer to overcome local optima, if the HalfnormIncrease mutation is used.
Make the individual fitness comparable between iterations
The relative complexity (N - C) / N used the current pool size as N, which changes between iterations. The reason for using the relative complexity and not the absolute one was to get a bound on the theoretical maximal fitness, which is 100 in our case. To be specific, it was a side effect of my efforts to find a fitness function which is restricted to positive values and combines several objectives. Weighting the objectives is much easier if they are all normalized, after all.
The current approach has the advantage that a relative complexity of 0 was possible in every iteration, if all rules in the pool are selected.
The disadvantage I want to fix with this PR is that the relative complexity increased by itself through iterations, even if the individual did not change, because N obviously increased. Solution quality itself should not be influenced by this, because it doesn't change the ordering of the solutions, so it doesn't matter for performance. What it matters for, though, is the comparison of fitness values between iterations. If the fitness of an individual increases by itself only through adding new rules to the pool, even using the same elitist during all iterations looks like the fitness is increasing, on the fitness plot. This 'drift' is not very pretty, so I changed N to be the size of the pool after all rules were generated, which is n_rules * n_iter + n_initial_rules. This way, the relative complexity stays normalized and weighting stays the same, and the fitness also stays constant through iterations. But a relative complexity of 0 is not possible anymore, except for the last iteration. I don't really think this matters, though.
I think my change makes sense? But if you find a flaw in my logic, please tell me.
Use cross validated local models if enough training data is available
We did talk about using cross validation to reduce possible overfitting of rules, but especially small rules don't match enough samples to do this meaningfully. I added a check to see if the rule matches more than 50 samples, and 5-fold cv is performed, if this is the case. It then chooses the model with the lowest test error, which is then used to calculate the fitness, etc. If the rules matches fewer samples, only a single model is fit on all samples, just like before, and the error is the training error.
I think this is a good compromise, and fitting 5 LinearRegression's instead of one is an overhead we cope with, in my opinion. These are just some matrix multiplications done almost completely using numpy, after all.
But the limit of samples where we switch to cv, which is currently 50, is something we should discuss.
Edit: Removed from this PR and moved to #68.
Prefer to generate rules with samples with high error as origins
We might want to concentrate on samples that have high error on the current prediction, not samples that were matched less than others, because even if a sample is only matched once, the rule that matches could be very good. Prefering samples with high error leads to concentrating more on the difficult parts of the input space. This error can either be calculated on the prediction of the elitist, or of all rules in the pool.
This PR addresses some enhancements that I either wanted to implement since a while or did arise from the meeting today.
Let ES1xLambda look into the future
Previously, the
&
operator has stopped the optimization process if no better candidate was found in an iteration. This change introduces adeque
which saves the lastdelay
elitists, and only performs early stopping if alldelay
elitists generated afterwards have worse fitness. This should help the optimizer to overcome local optima, if theHalfnormIncrease
mutation is used.Make the individual fitness comparable between iterations
The relative complexity
(N - C) / N
used the current pool size asN
, which changes between iterations. The reason for using the relative complexity and not the absolute one was to get a bound on the theoretical maximal fitness, which is 100 in our case. To be specific, it was a side effect of my efforts to find a fitness function which is restricted to positive values and combines several objectives. Weighting the objectives is much easier if they are all normalized, after all.The current approach has the advantage that a relative complexity of 0 was possible in every iteration, if all rules in the pool are selected.
The disadvantage I want to fix with this PR is that the relative complexity increased by itself through iterations, even if the individual did not change, because
N
obviously increased. Solution quality itself should not be influenced by this, because it doesn't change the ordering of the solutions, so it doesn't matter for performance. What it matters for, though, is the comparison of fitness values between iterations. If the fitness of an individual increases by itself only through adding new rules to the pool, even using the same elitist during all iterations looks like the fitness is increasing, on the fitness plot. This 'drift' is not very pretty, so I changedN
to be the size of the pool after all rules were generated, which isn_rules * n_iter + n_initial_rules
. This way, the relative complexity stays normalized and weighting stays the same, and the fitness also stays constant through iterations. But a relative complexity of 0 is not possible anymore, except for the last iteration. I don't really think this matters, though.I think my change makes sense? But if you find a flaw in my logic, please tell me.
Use cross validated local models if enough training data is availableWe did talk about using cross validation to reduce possible overfitting of rules, but especially small rules don't match enough samples to do this meaningfully. I added a check to see if the rule matches more than
50
samples, and 5-fold cv is performed, if this is the case. It then chooses the model with the lowest test error, which is then used to calculate the fitness, etc. If the rules matches fewer samples, only a single model is fit on all samples, just like before, and the error is the training error.I think this is a good compromise, and fitting 5
LinearRegression
's instead of one is an overhead we cope with, in my opinion. These are just some matrix multiplications done almost completely using numpy, after all. But the limit of samples where we switch to cv, which is currently 50, is something we should discuss.Edit: Removed from this PR and moved to #68.
Prefer to generate rules with samples with high error as origins
We might want to concentrate on samples that have high error on the current prediction, not samples that were matched less than others, because even if a sample is only matched once, the rule that matches could be very good. Prefering samples with high error leads to concentrating more on the difficult parts of the input space. This error can either be calculated on the prediction of the elitist, or of all rules in the pool.