heal-research / HeuristicLab

HeuristicLab - An environment for heuristic and evolutionary optimization
https://dev.heuristiclab.com
GNU General Public License v3.0
34 stars 16 forks source link

Support for categorical variables (R factors) for symbolic regression with GP #2650

Closed HeuristicLab-Trac-Bot closed 7 years ago

HeuristicLab-Trac-Bot commented 8 years ago

Issue migrated from trac ticket # 2650

milestone: HeuristicLab 3.3.15 | component: Problems.DataAnalysis.Symbolic | priority: medium | resolution: done

2016-08-03 18:10:37: @gkronber created the issue


We frequently encounter regression / classification problems where the dataset contains categorical variables. It would be great if such variables can be used directly within symbolic regression models.

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber commented


r14232 to r14233 : created a feature branch for #2650 (support for categorical variables in symb reg) with a first set of changes work in progress...

TODO:

  • handle correctly in all formatters (Smalltalk formatter and external evaluation formatter have not been adjusted)
  • view for factor variables (configuration of actually allowed factors)
  • create a set of unit tests for the simplifier (handle correctly in simplifier)
  • extend simplifier to handle BinaryFactorVariable
  • extend simplifier to combine FactorVariables with BinaryFactorVariable
  • handle correctly in variable impacts view
  • handle correctly in Non-linear regression (infix parser and infix formatter)
  • support in all analyzers which handle variable symbols specifically
  • support for pruning
  • symbol for WeightedFactorVariable (instead of only 0/1)
  • add an interface for variable symbols (with VariableName property)
  • handle correctly in gradient views
  • handle correctly in mathematical expression view
  • handle correctly in ERC view (create linear regression model)
  • handle correctly in symbolic classification - solution comparison
  • ~~handle correctly in OneR ~~

Open issues which are not strictly necessary for a first merge of the functionality:

  • support string variables in data preprocessing view
  • allow factor variables in decision trees (and therefore GBT)?
  • allow string variables as target variables in classification algorithms
  • Switch/Case symbol with one subtree for each possible factor value
  • handle correctly in SymbolicDataAnalysisExpressionTreeILEmittingInterpreter and SymbolicDataAnalysisExpressionCompiledTreeInterpreter (done: tree and linear interpreter)
  • support in more algs?
HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-03 18:58:55: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-05 17:34:41: @gkronber commented


r14237 to r14238 :

  • added weight for FactorVariable (necessary for LR)
  • introduced VariableBase and VariableTreeNodeBase and IVariableSymbol
  • support for factors in LR
  • extended variable impacts in solution view
  • fixed ERC view for regression
  • support for FactorVariable in simplifier
  • improved support for FactorVariable in constants optimizer
  • multiple related changes and small fixes
HeuristicLab-Trac-Bot commented 8 years ago

2016-08-05 17:34:41: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-05 17:40:26: @gkronber commented


r14239: #2650: merged r14234 to r14236 from trunk to branch

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-05 18:30:21: @gkronber commented


Shouldn't the variable impacts view be added as a solution view instead of an extra button?

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-05 18:45:03: @gkronber commented


r14240: added support for categorical variables to LDA and MNL

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-05 18:45:03: @gkronber

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-08 10:22:58: @gkronber commented


r14241: added support for factor variables in specific solution comparison view for symbolic classification solutions

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-08 10:25:22: @gkronber changed status from new to accepted

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-08 11:39:13: @gkronber commented


r14242: added support for factor variables to OneR algorithm

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-08 13:00:16: @gkronber commented


r14243: renamed FactorVariable -> BinaryFactorVariable

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-09 15:18:45: @gkronber commented


r14248: added support for factor variables to target variation view together with Philipp

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-09 15:34:46: @gkronber commented


r14249: added new symbol FactorVariable (renamed previous symbol to BinaryFactorVariable) Work in progress.

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-10 20:10:54: @gkronber commented


r14251:

  • extended non-linear regression to work with factors
  • fixed bugs in constants optimizer and tree interpreter
  • improved simplification of factor variables
  • added support for factors to ERC view
  • added support for factors to solution comparison view
  • activated view for all factors
HeuristicLab-Trac-Bot commented 8 years ago

2016-08-17 16:20:31: @gkronber commented


r14259: added support for factor variables to Excel formatter and Excel exporter as well as to the Latex formatter and consequently the mathematical representation view.

HeuristicLab-Trac-Bot commented 8 years ago

2016-08-29 10:29:45: @gkronber commented


r14266: improved handling of factors in ConstantOptimizationEvaluator (create binary indicators only once)

HeuristicLab-Trac-Bot commented 8 years ago

2016-09-08 11:42:28: @gkronber commented


  • r14276: merged r14244 from trunk to branch
  • r14277: merged r14245 to r14273 from trunk to branch (fixing conflicts in RegressionSolutionTargetResponseGradientView)
HeuristicLab-Trac-Bot commented 7 years ago

2016-09-20 11:07:30: @gkronber commented


Bugs:

  • Exception when showing the simplifier view after simplification of the tree (it seems some nodes are not cloned). (r14339)
  • Exception when trying to open data preprocessing view for a ProblemData object stored in a solution (#2683)
HeuristicLab-Trac-Bot commented 7 years ago

2016-09-20 11:07:30: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-10-13 19:47:55: @gkronber commented


r14330: merged r14282 to r14322 from trunk to branch (fixing conflicts)

HeuristicLab-Trac-Bot commented 7 years ago

2016-10-13 20:21:03: @gkronber commented


r14331: fixed compilation errors after merge

HeuristicLab-Trac-Bot commented 7 years ago

2016-10-18 22:09:56: @gkronber commented


r14339: fixed bug in simplification of factor symbols

HeuristicLab-Trac-Bot commented 7 years ago

2016-10-23 09:44:47: @gkronber commented


r14351: merged r14332 to r14350 from trunk to branch

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-17 15:30:51: @gkronber commented


r14399: merged r14352 to r14376 from trunk to branch (resolving conflicts in SymbolicDataAnalysisExpressionLatexFormatter

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-17 15:43:55: @gkronber commented


r14401: merged r14378 to r14400 from trunk to branch

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-17 18:40:01: @gkronber commented


r14402: fixed a bug in constant optimizer in relation to lagged variables

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-17 18:41:13: @gkronber commented


r14403: added support for factor variables to C# formatter

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-26 21:47:34: @gkronber commented


r14421 merged r14405 to r14418 from trunk to branch

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-26 21:57:09: @gkronber commented


Should be finished before #2697

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-02 17:35:48: @gkronber commented


r14449: merged r14422 to r14443 from trunk to branches resolving conflicts