mlr-org / mlr3learners

Recommended learners for mlr3
https://mlr3learners.mlr-org.com
GNU Lesser General Public License v3.0
89 stars 14 forks source link

binary glmnet handling of outcome factor names, Bug? #155

Closed kkmann closed 4 years ago

kkmann commented 4 years ago

Hi,

I ran into a problem with << .5 AUC performance on a binary classification task with glmnet. Turned out that the positive class was labelled "bad" vs "good" (negative). Since bad > good alphabetically, glmnet mixes up the positive and negative class (as documented).

It seems that setting the positive = "bad" property of the task does not change that. Only after manually renaming the labels to "0" = "good" and "1" = "bad" did I get reasonable results.

I guess the problem is that this quirk of glmnet is not handled in, the outcome factor is just passed in as-is

https://github.com/mlr-org/mlr3learners/blob/9107004cd1bb7d9eaddbdd665d09c7e65aa9c4d6/R/LearnerRegrCVGlmnet.R#L101

mllg commented 4 years ago

Confirmed bug for glmnet + cv_glmnet.

  task = tgen("2dnormals")$generate(100)
  learner = lrn("classif.glmnet", predict_type = "prob")

  p = learner$train(task)$predict(task)
  p$score(msr("classif.auc"))
mllg commented 4 years ago

Thanks for reporting, fixed in master.