mwelz / GenericML

R implementation of Generic Machine Learning Inference (Chernozhukov, Demirer, Duflo and Fernández-Val, 2020).
GNU General Public License v3.0
60 stars 14 forks source link

Labels for learners #26

Open aalfons opened 2 years ago

aalfons commented 2 years ago

The mlr3 specification can be a bit cumbersome in output, for example:

> library("GenericML")
Loading required package: ggplot2
Loading required package: mlr3
Loading required package: mlr3learners
> ## generate data
> set.seed(1)
> n  <- 150                                  # number of observations
> p  <- 5                                    # number of covariates
> D  <- rbinom(n, 1, 0.5)                    # random treatment assignment
> Z  <- matrix(runif(n*p), n, p)             # design matrix
> Y0 <- as.numeric(Z %*% rexp(p) + rnorm(n)) # potential outcome without treatment
> Y1 <- 2 + Y0                               # potential outcome under treatment
> Y  <- ifelse(D == 1, Y1, Y0)               # observed outcome
> 
> ## column names of Z
> colnames(Z) <- paste0("V", 1:p)
> 
> ## specify learners
> learners <- c("tree", "mlr3::lrn('ranger', num.trees = 10)")
> 
> ## perform generic ML inference
> # small number of splits to keep computation time low
> x <- GenericML(Z, D, Y, learners, num_splits = 2,
+                parallel = FALSE)
> 
> ## access best learner
> get_best(x)
                                     lambda lambda.bar
tree                                0.01096      5.512
mlr3::lrn('ranger', num.trees = 10) 0.11812      5.303
---
The best learner for BLP is mlr3::lrn('ranger', num.trees = 10) with lambda = 0.1181. 
The best learner for GATES and CLAN is mlr3::lrn('ranger', num.trees = 10) with lambda.bar = 5.5124.
>

We could allow to give cleaner labels by using names in vector that specifies of the learners, for example:

> learners <- c("tree", forest = "mlr3::lrn('ranger', num.trees = 10)")
> names(learners)
[1] ""       "forest"

If an element in the vector has an empty name (""), we use the vector element itself in the output. If there is a non-empty name for a learner ("forest") we use this in the output instead of the vector element ("mlr3::lrn('ranger', num.trees = 10)").

In the example above, the output would then look like:

> get_best(x)
        lambda lambda.bar
tree   0.01096      5.512
forest 0.11812      5.303
---
The best learner for BLP is forest with lambda = 0.1181. 
The best learner for GATES and CLAN is forest with lambda.bar = 5.5124.
aalfons commented 2 years ago

There is no hurry for this, but it would be nice to have this for the paper.