mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
927 stars 86 forks source link

Learner Seed #906

Closed sebffischer closed 1 week ago

sebffischer commented 1 year ago

For some learners (like neural networks) the seed can have an impact on the performance. Currently, when conducting a resampling, a learner is fit with a new seed on each fold. This means that we are not only measuring the variance the choice the train and test set has on performance, but also the influence of the initialization. While this is desirable in some situations, sometimes one might not want it. E.g. if I want to generalize the performance estimate from the resampling to the final model, I probably want to ensure that these are as similar as possible, therefore also keeping the seed identical.

Suggestion: A learner could have a field like seed, that is by default initialized to NULL (which is backwards compatible with the previous behavior) but can be configured to a specific value. If this value is set, then the set is set before the learner is trained (by passing it to the invoke() call in learner$train (and unset to the previous seed afterwards).