mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
947 stars 85 forks source link

BREAKING CHANGE: loglik, AIC, BIC, feature-creep #1119

Closed berndbischl closed 1 week ago

berndbischl commented 3 months ago

WARNING: BEFORE WE MERGE:

aic and bic should be removed from book, loglik too check mlr3learners mlr3tuning autotuner add to NEWS

mllg commented 2 months ago

Also needs to be removed from mlr3learners for log reg, multinom and linear regression. Or could just stay in the package with improved documentation.

mllg commented 2 months ago

TBH, I'd really like to keep AIC/BIC in the packages. I show this during teaching together with a SFS and compare it to step().

mllg commented 2 months ago

After a closer look, we could remove the properties and methods altogether but keep the measures for AIC and BIC. The user then has to ensure that stats::logLik() is properly working on the fitted model. This is a S3 generic and can be extended. We would only lose some pre-checks and improved error messages.

sebffischer commented 2 months ago

One more comment about the weights_resampling. What do you think about not making this weights, but counts instead? For the holdout resampling, e.g., for a Task with row_ids 1:10 we could set the resampling_weights to c(rep(2, 5), rep(1, 5)). Then, when sampling the training data we would first modify the row_ids to be c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 9, 10), then sample the training data and use those IDs that were not sampled for the train set as the test set.

sebffischer commented 2 months ago

One more comment about the weights_resampling. What do you think about not making this weights, but counts instead? For the holdout resampling, e.g., for a Task with row_ids 1:10 we could set the resampling_weights to c(rep(2, 5), rep(1, 5)). Then, when sampling the training data we would first modify the row_ids to be c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 9, 10), then sample the training data and use those IDs that were not sampled for the train set as the test set.

Otoh the question is then whether we really need this because we can already set the $row_ids of the task ...

berndbischl commented 2 months ago

After a closer look, we could remove the properties and methods altogether but keep the measures for AIC and BIC. The user then has to ensure that stats::logLik() is properly working on the fitted model. This is a S3 generic and can be extended. We would only lose some pre-checks and improved error messages.

@mllg. this is fine I HOPE, IF IF IF we not implement any further stuff for this. this would mean, AIC / BIC have the "req_model" property and they simply internally call stats:logLik. if that does not work, the measure fails. and we doc that this happens.

be-marc commented 1 week ago

Moved to #1207