Closed HeuristicLab-Trac-Bot closed 9 years ago
SymbolicDataAnalysisExpressionPruningOperator
to Improve the SymbolicDataAnalysisExpressionPruningOperatorPlease add the number of removed nodes as an additional data series in the analyzer's data table. Furthermore, have a detailed look at the changes in r12358 and review them.
2398 depends on this ticket
SymbolicDataAnalysisExpressionPruningOperator.Apply() produces incorrect quality values.
The problem is two-fold. (1) It is assumed that impacts of replacements are additive concerning the quality value. First the original quality is retrieved. In the loop over all nodes impacts are calculated repeatedly and each time node is pruned the quality is reduced by the impact value (quality -= impactValue).
(2) Impact calculators use accuracy (classification) or R² (regression) to calculate the impacts. However, the evaluation operator from the problem can be different (such as MSE or absolute error) therefore we cannot simply subtract the impact from the quality.
Proposed solution: completely re-evaluate pruned models with the evaluation operator from the problem.
Regarding (1), the
CalculateImpactsAndReplacementValues
uses internally the PearsonsRSquared measure (for regression) and the accuracy measure (for classification) to calculate impacts, which is exactly what theSymbolicDataAnalysisPruningOperator.Evaluate
method provides. Providing anoriginalQuality
simply avoids recalculating it inside the metohd on each call. Since the impact is actually calculated asimpactValue = originalQuality - newQuality
, within the for loop the neworiginalQuality
can be calculated asquality -= impactValue
, which helps speed things up between successive calls. The confusion lies here in the terminology: theoriginalQuality
accepted by theCalculateImpactsAndReplacementValues
has no connection to the actual quality of the indivudal (which can be MSE, absolute error, etc).(2) is indeed a problem, as the quality should not be updated that way. The problem is the line
QualityParameter.ActualValue.Value = quality
where as you pointed out, we cannot assume anything about the evaluation operator from the problem and which kind of quality measure it provides. Therefore, the solution should indeed be to completely re-evaluate pruned models with the evaluation operator from the problem.
r12720: Changed the impact calculators so that the quality value necessary for impacts calculation is calculated with a separate method. Refactored the
CalculateImpactAndReplacementValues
method to return the new quality in an out-parameter (adjusted method signature in interface accordingly). AddedEvaluate
method to the regression and classification pruning operators that re-evaluates the tree using the problem evaluator after pruning was performed.
Reviewed all changes and found out that the pruning operators are not backwards compatible because parameters where added/removed/type-changed..
r12744: added after-deserialization code for backwards-compatibility
Issue migrated from trac ticket # 2359
milestone: HeuristicLab 3.3.12 | component: Problems.DataAnalysis.Symbolic | priority: medium | resolution: done | keywords: pruning, symbolic data analysis, classification, regression
2015-03-11 11:43:54: @foolnotion created the issue