Open romanovzky opened 3 weeks ago
I like the idea of using the callback mechanism for this, so that users have different options for model selection. Selecting based on a validation set could be a good default. Other options are selection based on criteria such as Bayesian evidence, AIC, BIC or description length, but these could be easily added by users once the callback mechanism is in place.
Currently,
SymbolicRegressor
returns a model that better complies with a certain criteria. This, however, is computed on the training set. Machine learning best practices dictate that model selection should be done using a validation set. Currently, this can be "hacked" by selecting the best pareto front individual against a validation metric after theSymbolicRegressor
completes its run. However, with callbacks (see https://github.com/heal-research/pyoperon/issues/18) this feature could allow for earlystop criteria using the validation set. This is common in machine learning packages with iterative training (see Keras, Lightning, xGboost, etc for examples).