Closed FlorianPargent closed 3 months ago
It seems that https://github.com/mlr-org/mlr3learners/issues/177 is the correct link to where this was discussed and changed, referencing https://stats.stackexchange.com/a/386913
I agree with the OP - observation weights are very commonly used with logistic regression, including when downsampling a large EHR dataset for ML.
In version 0.5.4 of mlr3learners you removed the "weights" property of the "classif.log_reg" learner based on the reasoning in this thread mlr-org/mlr3pipelines#177. I would strongly argue that the reasoning in that thread is false and that the "weights" argument in the glm function works exactly as expected and should be restored for the "classif.log_reg" learner in mlr3learners as fast as possible!
One way to see that the glm weights work as expected is this blog article. As you can see in the blog, estimating a logistic regression model with unaggregated data (the usual case) gives exactly the same model (see the coefficient estimates and p-values) as estimating the model with aggregated data and using the number of observations per condition as weights. Note that in glm there is an important difference between "prior weights" (the weight argument in glm) and working weights (the weights value of a fitted glm object), as explained in the documentation of glm function. In case of mlr3learners, we are only concerned with the prior weights, which work exactly as expected.
Another example which shows why restoring the weights functionality is important is my own preprint on using cost-sensitive learning with mlr3. The benchmark in Table 3 was computed with an older version of mlr3learners when the weights property in classif.log was still available. We can clearly see that logistic regression with theoretical weighting (TW in Table 3: weights are theoretically chosen based on the cost matrix) gives about the same predictive performance as with empirical weighting (EW in Table 3: weights are empirically tuned). The similarity in predictive performance again demonstrates that the weights work as expected: I computed the theoretical weights in the usual way and empirical weighting leads to almost the same weights.
As demonstrated by our article (which is now accepted by a journal), the weights property for classif.log_reg is extremely valueable and should be restored immediately! We noticed the removal of the weights argument because we are currently preparing our manuscript for publication. Currently we are forced to use an older version of mlr3learners, which is extremely inconvenient for any readers of our tutorial-style paper. We would greatly appreciate if you could change this as soon as possible. If anybody would run our tutorial code with the current version of mlr3learners and mlr3pipelines, they would get highly misleading results! See also our related issue in the repository of mlr3pipelines: mlr-org/mlr3#931.