mlr-org / mlr3learners

Recommended learners for mlr3
https://mlr3learners.mlr-org.com
GNU Lesser General Public License v3.0
89 stars 14 forks source link

[BUGLRN] Bugs in learner lda: variable appears to be constant within groups #230

Closed MislavSag closed 2 years ago

MislavSag commented 2 years ago

Expected Behaviour

classif.lda works as expected.

Actual Behaviour

It returns an error:

Error in lda.default(x, grouping, ...) : 
  variable 44 appears to be constant within groups

Reprex

task_ = tsk("german_credit")
task_$task_type
learner = lrn("classif.lda")
learner$predict_sets = c("train", "test")
learner$predict_type = "prob"
fm = learner$train(task_)
sebffischer commented 2 years ago

This does not really seem like a mlr3 bug to me @mllg

MislavSag commented 2 years ago

Bug in original R package ot problem with data?

mllg commented 2 years ago

This yields the same error:

library(MASS)
data = as.data.frame(tsk("german_credit")$data())
lda(credit_risk ~ ., data = data)

There is a problem with the data after converting the data to a matrix and dummy encoding the factors. From MASS:::lda.formula():

m = model.frame(data)
grouping <- model.response(m)
x <- model.matrix(Terms, m)
xint <- match("(Intercept)", colnames(x), nomatch = 0L)
if (xint > 0L) 
  x <- x[, -xint, drop = FALSE]

If you now look into the 44th column, grouped by credit risk, you see that there is no observation labeled with "1":

ftable(x[, 44] ~ data$credit_risk)

                 x[, 44]   0
data$credit_risk
good                     700
bad                      300

You can control the tolerance for the singular matrix detection via parameter tol, but having a constant feature will always result in an error if you try to fit a LDA.

mllg commented 2 years ago

FWIW, you can "repair" the learner via the robustify pipeline:

library(mlr3pipelines)
learner = as_learner(ppl("robustify") %>>% lrn("classif.lda"))
learner$param_set$values$encode.method = "treatment" # otherwise we get colinear features
learner$train(task)
learner$predict(task)
MislavSag commented 2 years ago

Thanks for solution !