mlr-org / mlr3proba

Probabilistic Learning for mlr3
https://mlr3proba.mlr-org.com
GNU Lesser General Public License v3.0
116 stars 20 forks source link

Implement Integrated Calibration Index (ICI) as a performance measure and construct smoothed calibration curves post benchmark procedures #378

Open Lee-xli opened 2 months ago

Lee-xli commented 2 months ago

Hi mlr3proba team,

Some recent recommendations on evaluation/validation of survival prediction models call for estimation of individuals' observed outcome probabilities and the model predicted probabilities (at given time points), such as the smoothed calibration curves and the integrated calibration index (ICI) (Austin et al, 2020; Riley et al, 2024).

I have successfully computed the calibration index and constructed the plots after training models in mlr3, however, going forward, I would like to implement ICI as a custom measure during benchmark procedures and be able to construct calibration curve directly after benchmark procedures. In my attempts implementing these, I have encountered some difficulties extracting relevant predictions from outer loops of nested cv benchmark procedures, and I have opened up a question on StackOverflow (https://stackoverflow.com/questions/78364286/how-to-extract-predictions-say-of-survival-probability-of-the-outer-loop-sampl). And I wonder if mlr3proba may already have something similar in the pipeline for implementation.

I would very much appreciate any advice and guidance from mlr3proba team directly.

References: Austin PC, Harrell FE Jr, van Klaveren D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat Med. 2020 Sep 20;39(21):2714-2742. doi: 10.1002/sim.8570. Epub 2020 Jun 16. PMID: 32548928; PMCID: PMC7497089. Riley RD, Archer L, Snell KIE, Ensor J, Dhiman P, Martin GP, Bonnett LJ, Collins GS (2024) Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ. 384: e074820. doi: 10.1136/bmj-2023-074820

bblodfon commented 2 months ago

Hi @Lee-xli, thanks for sharing all of these! We will definitely like some new (calibration or other) survival measures based on recent literature for sure. I will take a look at the papers and see what we can do - and if you have some sample code that would be excellent. I will create a new issue for this.

Now for the stackoverflow question, the problem we face is that people don't use reprex::reprex() and don't output library versions, ie it could be that the mlr3 version you used or mlr3proba or mlr3extralearners is a bit old. I suggest you post it up here with a reprex + library versions and I will definitely check it out.

Lee-xli commented 2 months ago

Thank you very much @bblodfon ! I will work on these and get back to you soon!

Lee-xli commented 2 months ago

custom measure ICI mlr3.R.zip UPDATED analysis

Hi John,

Thank you very much once again! Below is a simple worked example of implementing ICI and the smoothed calibration plot using hazard regression (hare function via the polspline package). The alternative method (per Austin et al, 2020) using restricted cubic splines may also be very easily implemented, but my impression is that its additional proportional hazard assumption and arbitrary choice of knots are some key drawbacks. The code below is largely based on the Austin, Harrell & Klaveren reference.

rm(list = (ls(all=T)))
library(distr6)
#> 
#> Attaching package: 'distr6'
#> The following object is masked from 'package:stats':
#> 
#>     qqplot
#> The following object is masked from 'package:base':
#> 
#>     truncate
library(skimr)
library(mlr3)
library(mlr3learners)
library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3tuning)
#> Loading required package: paradox
library(mlr3proba)
library(mlr3misc)
#> 
#> Attaching package: 'mlr3misc'
#> The following objects are masked from 'package:Hmisc':
#> 
#>     %nin%, capitalize
library(data.table)
library(polspline)
library(pec)
#> Loading required package: prodlim

# set up data
task_lung = tsk('lung')
d = task_lung$data()
d$time = ceiling(d$time/30.44)
task_lung = as_task_surv(d, time = 'time', event = 'status', id = 'lung')
po_encode = po('encode', method = 'treatment')
po_impute = po('imputelearner', lrn('regr.rpart'))
pre = po_encode %>>% po_impute
task = pre$train(task_lung)[[1]]
dt=task$data()

#### learners ----
cph=lrn("surv.coxph")
lasso_cv=as_learner(po("encode") %>>% lrn("surv.cv_glmnet", alpha=1))

# get baseline hazard estimates
comp.lasso = as_learner(ppl(  
  "distrcompositor",
  learner = lasso_cv,
  estimator = "kaplan",
  form = "ph",
  overwrite=T
))

# Benchmark 
set.seed(1234)
BM1 = benchmark(benchmark_grid(task,
                               list(cph),
                               rsmp('cv', folds=4)),
                store_models =T)
#> INFO  [00:19:46.464] [mlr3] Running benchmark with 4 resampling iterations
#> INFO  [00:19:46.569] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 1/4)
#> INFO  [00:19:46.638] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 2/4)
#> INFO  [00:19:46.689] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 3/4)
#> INFO  [00:19:46.738] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 4/4)
#> INFO  [00:19:46.788] [mlr3] Finished benchmark

### predictions ----
data = as.data.table(BM1) 
nrow(data) # number of outer sample/split/test dataset
#> [1] 4
pred=t(data$prediction[[1]]$distr$cdf(12)) # event prob

test_1=dt[c(data$prediction[[1]]$row_ids)]  # dataset for the outer layer (test set)
train_1=dt[-c(data$prediction[[1]]$row_ids)] # dataset for the inner layer (training set)

### hare ----
# add predicted prob and its cll to testing dataset
test_1[, pred.prob := pred][, pred.prob.cll := log(-log(1-pred.prob))]
cali.cox <- hare(data=test_1$time, delta = test_1$status, 
                 cov = as.matrix(test_1$pred.prob.cll))

### plot ----
# use above hare model to get observed prob (over a range of predicted probability based on the cox model)
pred.grid = seq(quantile(test_1$pred.prob, probs=0.01),
                quantile(test_1$pred.prob, prob=0.99),
                length=100)
pred.grid.cll=log(-log(1-pred.grid))

pred.cali.cox=phare(12, pred.grid.cll, cali.cox) # predict observed event prob from HARE model

# plot data
cali=data.frame(cbind(predicted=pred.grid, observed=pred.cali.cox))
p <- ggplot(cali, aes(x=predicted, y=observed))+
  geom_line() +
  geom_abline(slope = 1, intercept=0, alpha=0.5, linetype='dashed') +
  ylab('Observed prob of 1-year mortality') +
  scale_y_continuous(limits=c(0,1)) +
  xlab('Predicted prob of 1-year mortality') +
  scale_x_continuous(limits = c(0,1)) +
  theme_bw() + ggtitle('Lung - coxph (mlr3 benchmark)') +
  theme(legend.position = 'top')
p


### ICI ---- 
# is equivalent to the MEAN difference between predicted model probabilities and observed probabilities derived from smooothed calibration curve
test_1[, ob.hare := phare(12, test_1$pred.prob.cll, cali.cox)][, abs.diff:= abs(ob.hare - pred.prob)]
ICI_cox=mean(test_1$abs.diff) ; ICI_cox
#> [1] 0.07991534

Created on 2024-05-10 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 (2021-08-10) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Adelaide #> date 2024-05-10 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> backports 1.4.1 2021-12-13 [1] #> base64enc 0.1-3 2015-07-28 [2] #> bbotk 0.8.0 2024-02-29 [1] #> checkmate 2.3.1 2023-12-04 [1] #> cli 3.6.2 2023-12-11 [1] #> cluster 2.1.2 2021-04-17 [2] #> codetools 0.2-18 2020-11-04 [2] #> colorspace 2.1-0 2023-01-23 [1] #> conquer 1.0.2 2020-08-27 [2] #> crayon 1.4.1 2021-02-08 [2] #> curl 4.3.2 2021-06-23 [2] #> data.table * 1.15.4 2024-03-30 [1] #> dictionar6 0.1.3 2021-09-13 [1] #> digest 0.6.35 2024-03-11 [1] #> distr6 * 1.8.4 2024-05-02 [1] #> dplyr 1.1.3 2023-09-03 [1] #> evaluate 0.23 2023-11-01 [1] #> fansi 1.0.6 2023-12-08 [1] #> farver 2.1.1 2022-07-06 [1] #> fastmap 1.1.0 2021-01-25 [2] #> foreach 1.5.1 2020-10-15 [2] #> foreign 0.8-81 2020-12-22 [2] #> Formula * 1.2-4 2020-10-16 [2] #> fs 1.5.0 2020-07-31 [2] #> future 1.33.2 2024-03-26 [1] #> future.apply 1.11.2 2024-03-28 [1] #> generics 0.1.3 2022-07-05 [1] #> ggplot2 * 3.5.1 2024-04-23 [1] #> globals 0.16.3 2024-03-08 [1] #> glue 1.7.0 2024-01-09 [1] #> gridExtra 2.3 2017-09-09 [2] #> gtable 0.3.5 2024-04-22 [1] #> highr 0.9 2021-04-16 [2] #> Hmisc * 4.5-0 2021-02-28 [2] #> htmlTable 2.2.1 2021-05-18 [2] #> htmltools 0.5.6 2023-08-10 [1] #> htmlwidgets 1.5.3 2020-12-10 [2] #> httr 1.4.2 2020-07-20 [2] #> iterators 1.0.13 2020-10-15 [2] #> jpeg 0.1-9 2021-07-24 [2] #> knitr 1.33 2021-04-24 [2] #> labeling 0.4.3 2023-08-29 [1] #> lattice * 0.20-44 2021-05-02 [2] #> latticeExtra 0.6-29 2019-12-19 [2] #> lava 1.6.9 2021-03-11 [2] #> lgr 0.4.4 2022-09-05 [1] #> lifecycle 1.0.4 2023-11-07 [1] #> listenv 0.9.1 2024-01-29 [1] #> magrittr 2.0.3 2022-03-30 [1] #> MASS 7.3-54 2021-05-03 [2] #> Matrix 1.3-4 2021-06-01 [2] #> MatrixModels 0.5-0 2021-03-02 [2] #> matrixStats 0.60.0 2021-07-26 [2] #> mime 0.11 2021-06-23 [2] #> mlr3 * 0.19.0 2024-04-24 [1] #> mlr3extralearners * 0.7.1 2023-11-24 [1] #> mlr3learners * 0.6.0 2024-03-13 [1] #> mlr3misc * 0.15.0 2024-04-10 [1] #> mlr3pipelines * 0.5.2 2024-04-23 [1] #> mlr3proba * 0.6.1 2024-05-02 [1] #> mlr3tuning * 0.20.0 2024-03-05 [1] #> mlr3viz 0.8.0 2024-03-05 [1] #> multcomp 1.4-17 2021-04-29 [2] #> munsell 0.5.1 2024-04-01 [1] #> mvtnorm 1.1-2 2021-06-07 [2] #> nlme 3.1-152 2021-02-04 [2] #> nnet 7.3-16 2021-05-03 [2] #> numDeriv 2016.8-1.1 2019-06-06 [2] #> ooplah 0.2.0 2022-01-21 [1] #> palmerpenguins 0.1.1 2022-08-15 [1] #> pander 0.6.4 2021-06-13 [2] #> paradox * 0.11.1-9000 2023-11-22 [1] #> parallelly 1.37.1 2024-02-29 [1] #> param6 0.2.4 2023-11-22 [1] #> pec * 2020.11.17 2020-11-16 [2] #> pillar 1.9.0 2023-03-22 [1] #> pkgconfig 2.0.3 2019-09-22 [2] #> png 0.1-7 2013-12-03 [2] #> polspline * 1.1.19 2020-05-15 [2] #> prodlim * 2019.11.13 2019-11-17 [2] #> quantreg 5.86 2021-06-06 [2] #> R6 2.5.1 2021-08-19 [1] #> RColorBrewer 1.1-3 2022-04-03 [1] #> Rcpp 1.0.12 2024-01-09 [1] #> reprex 2.0.1 2021-08-05 [1] #> RhpcBLASctl 0.23-42 2023-02-11 [1] #> rlang 1.1.3 2024-01-10 [1] #> rmarkdown 2.10 2021-08-06 [2] #> rms * 6.2-0 2021-03-18 [2] #> rpart 4.1-15 2019-04-12 [2] #> rstudioapi 0.15.0 2023-07-07 [1] #> sandwich 3.0-1 2021-05-18 [2] #> scales 1.3.0 2023-11-28 [1] #> sessioninfo 1.1.1 2018-11-05 [2] #> set6 0.2.6 2023-11-22 [1] #> skimr * 1.0.2 2021-08-13 [2] #> SparseM * 1.81 2021-02-18 [2] #> stringi 1.7.3 2021-07-16 [2] #> stringr 1.5.0 2022-12-02 [1] #> survival * 3.5-7 2023-08-14 [1] #> TH.data 1.0-10 2019-01-21 [2] #> tibble 3.2.1 2023-03-20 [1] #> tidyselect 1.2.0 2022-10-10 [1] #> timereg 2.0.0 2021-05-20 [2] #> utf8 1.2.4 2023-10-22 [1] #> uuid 1.2-0 2024-01-14 [1] #> vctrs 0.6.5 2023-12-01 [1] #> withr 3.0.0 2024-01-16 [1] #> xfun 0.25 2021-08-06 [2] #> xml2 1.3.2 2020-04-23 [2] #> yaml 2.2.1 2020-02-01 [2] #> zoo 1.8-9 2021-03-09 [2] #> source #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> Github (xoopR/distr6@a7c01f7) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> Github (mlr-org/mlr3extralearners@6e2af9e) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> https://mlr-org.r-universe.dev (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> https://mlr-org.r-universe.dev (R 4.1.1) #> CRAN (R 4.1.1) #> Github (xoopR/param6@0fa3577) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> Github (xoopR/set6@a901255) #> Github (jeremyrcoyle/skimr@a1c792d) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> #> [1] /Users/Lee/Library/R/x86_64/4.1/library #> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```

Above is just for the first outer loop, the entire analyses may be implemented with a locally written custom measure

rm(list = (ls(all=T)))
library(mlr3)
library(mlr3learners)
library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3tuning)
#> Loading required package: paradox
library(mlr3proba)
library(mlr3misc)
library(data.table)
library(polspline)
library(R6)
source('~/Flinders/Michael Sorich - Clinical trial analysis/Flinders/Michael Sorich - Tutorial/_indevelopment/calibration/custom measure ICI mlr3.R')

# set up data
task_lung = tsk('lung')
d = task_lung$data()
d$time = ceiling(d$time/30.44)
task_lung = as_task_surv(d, time = 'time', event = 'status', id = 'lung')
po_encode = po('encode', method = 'treatment')
po_impute = po('imputelearner', lrn('regr.rpart'))
pre = po_encode %>>% po_impute
task = pre$train(task_lung)[[1]]
dt=task$data()

#### learners ----
cph=lrn("surv.coxph")
lasso_cv=as_learner(po("encode") %>>% lrn("surv.cv_glmnet", alpha=1))

# get baseline hazard estimates
comp.lasso = as_learner(ppl(  
  "distrcompositor",
  learner = lasso_cv,
  estimator = "kaplan",
  form = "ph",
  overwrite=T
))

set.seed(1234)
BM1 = benchmark(benchmark_grid(task,
                               list(cph, comp.lasso),
                               rsmp('cv', folds=4)),
                store_models =T)
#> INFO  [00:35:24.750] [mlr3] Running benchmark with 8 resampling iterations
#> INFO  [00:35:24.827] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 1/4)
#> INFO  [00:35:24.900] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 2/4)
#> INFO  [00:35:24.951] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 3/4)
#> INFO  [00:35:25.000] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 4/4)
#> INFO  [00:35:25.060] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 1/4)
#> INFO  [00:35:25.473] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 2/4)
#> INFO  [00:35:25.794] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 3/4)
#> INFO  [00:35:26.090] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 4/4)
#> INFO  [00:35:26.379] [mlr3] Finished benchmark

 # MeasureSurvICI$debug(".score")
# MeasureSurvICI$undebug(".score")
test=BM1$score(msr('surv.ICI', tm=12))  # default return ICI
test
#>       nr task_id                                                learner_id
#>    <int>  <char>                                                    <char>
#> 1:     1    lung                                                surv.coxph
#> 2:     1    lung                                                surv.coxph
#> 3:     1    lung                                                surv.coxph
#> 4:     1    lung                                                surv.coxph
#> 5:     2    lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 6:     2    lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 7:     2    lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 8:     2    lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#>    resampling_id iteration   surv.ICI
#>           <char>     <int>      <num>
#> 1:            cv         1 0.07991534
#> 2:            cv         2 0.08754933
#> 3:            cv         3 0.12298290
#> 4:            cv         4 0.20094110
#> 5:            cv         1 0.02420689
#> 6:            cv         2 0.06097302
#> 7:            cv         3 0.10857322
#> 8:            cv         4 0.03784889
#> Hidden columns: uhash, task, learner, resampling, prediction
test.plot=BM1$score(msr('surv.ICI', tm=12, plot=T)) # plot=T returns plots
#> Error in ggplot(cali, aes(x = predicted, y = observed)): could not find function "ggplot"
test.plot
#> Error in eval(expr, envir, enclos): object 'test.plot' not found
BM1$aggregate(msr('surv.ICI')) 
#> Error in t.default(prediction$distr$cdf(tm)): argument is not a matrix

Created on 2024-05-10 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 (2021-08-10) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Adelaide #> date 2024-05-10 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> backports 1.4.1 2021-12-13 [1] #> bbotk 0.8.0 2024-02-29 [1] #> checkmate 2.3.1 2023-12-04 [1] #> cli 3.6.2 2023-12-11 [1] #> codetools 0.2-18 2020-11-04 [2] #> colorspace 2.1-0 2023-01-23 [1] #> crayon 1.4.1 2021-02-08 [2] #> data.table * 1.15.4 2024-03-30 [1] #> dictionar6 0.1.3 2021-09-13 [1] #> digest 0.6.35 2024-03-11 [1] #> distr6 1.8.4 2024-05-02 [1] #> dplyr 1.1.3 2023-09-03 [1] #> evaluate 0.23 2023-11-01 [1] #> fansi 1.0.6 2023-12-08 [1] #> fastmap 1.1.0 2021-01-25 [2] #> foreach 1.5.1 2020-10-15 [2] #> fs 1.5.0 2020-07-31 [2] #> future 1.33.2 2024-03-26 [1] #> future.apply 1.11.2 2024-03-28 [1] #> generics 0.1.3 2022-07-05 [1] #> ggplot2 3.5.1 2024-04-23 [1] #> glmnet 4.1-2 2021-06-24 [2] #> globals 0.16.3 2024-03-08 [1] #> glue 1.7.0 2024-01-09 [1] #> gtable 0.3.5 2024-04-22 [1] #> highr 0.9 2021-04-16 [2] #> htmltools 0.5.6 2023-08-10 [1] #> iterators 1.0.13 2020-10-15 [2] #> knitr 1.33 2021-04-24 [2] #> lattice 0.20-44 2021-05-02 [2] #> lgr 0.4.4 2022-09-05 [1] #> lifecycle 1.0.4 2023-11-07 [1] #> listenv 0.9.1 2024-01-29 [1] #> magrittr 2.0.3 2022-03-30 [1] #> Matrix 1.3-4 2021-06-01 [2] #> mlr3 * 0.19.0 2024-04-24 [1] #> mlr3extralearners * 0.7.1 2023-11-24 [1] #> mlr3learners * 0.6.0 2024-03-13 [1] #> mlr3misc * 0.15.0 2024-04-10 [1] #> mlr3pipelines * 0.5.2 2024-04-23 [1] #> mlr3proba * 0.6.1 2024-05-02 [1] #> mlr3tuning * 0.20.0 2024-03-05 [1] #> mlr3viz 0.8.0 2024-03-05 [1] #> munsell 0.5.1 2024-04-01 [1] #> ooplah 0.2.0 2022-01-21 [1] #> palmerpenguins 0.1.1 2022-08-15 [1] #> paradox * 0.11.1-9000 2023-11-22 [1] #> parallelly 1.37.1 2024-02-29 [1] #> param6 0.2.4 2023-11-22 [1] #> pillar 1.9.0 2023-03-22 [1] #> pkgconfig 2.0.3 2019-09-22 [2] #> polspline * 1.1.19 2020-05-15 [2] #> R6 * 2.5.1 2021-08-19 [1] #> Rcpp 1.0.12 2024-01-09 [1] #> reprex 2.0.1 2021-08-05 [1] #> RhpcBLASctl 0.23-42 2023-02-11 [1] #> rlang 1.1.3 2024-01-10 [1] #> rmarkdown 2.10 2021-08-06 [2] #> rpart 4.1-15 2019-04-12 [2] #> rstudioapi 0.15.0 2023-07-07 [1] #> scales 1.3.0 2023-11-28 [1] #> sessioninfo 1.1.1 2018-11-05 [2] #> set6 0.2.6 2023-11-22 [1] #> shape 1.4.6 2021-05-19 [2] #> stringi 1.7.3 2021-07-16 [2] #> stringr 1.5.0 2022-12-02 [1] #> survival 3.5-7 2023-08-14 [1] #> survivalmodels 0.1.191 2024-03-19 [1] #> tibble 3.2.1 2023-03-20 [1] #> tidyselect 1.2.0 2022-10-10 [1] #> utf8 1.2.4 2023-10-22 [1] #> uuid 1.2-0 2024-01-14 [1] #> vctrs 0.6.5 2023-12-01 [1] #> withr 3.0.0 2024-01-16 [1] #> xfun 0.25 2021-08-06 [2] #> yaml 2.2.1 2020-02-01 [2] #> source #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> Github (xoopR/distr6@a7c01f7) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> Github (mlr-org/mlr3extralearners@6e2af9e) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> https://mlr-org.r-universe.dev (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.2) #> https://mlr-org.r-universe.dev (R 4.1.1) #> CRAN (R 4.1.1) #> Github (xoopR/param6@0fa3577) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> Github (xoopR/set6@a901255) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> #> [1] /Users/Lee/Library/R/x86_64/4.1/library #> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```

Please see the R6 function attached.

I am an absolute novice in object oriented programming and I achieved above from parrotlike learning. I am sure the R6 function has lots of room for improvement, and on top of that there are few other issues as shown in above reprex() output:

  1. $aggregate is not working for some reasons.
  2. The plots did get drawn from executing the code despite the error message. I think it would be useful to output both well-labeled plots and plot data, both of which I failed to achieve. With the plots, my main issue is that I am not sure how to extract the index of the subsample (i.e. "1st", "2nd" etc fold of "cph" learner). Once the index and learner information is extracted, a proper graph title can be added. With the plot data, I think the issue is that I am not sure how to finally output data frames. It seems only a single numerical measure may be returned.

I will trial the custom measure on our research datasets in the coming week.

Looking forward to hearing back from you.

bblodfon commented 2 months ago

@Lee-xli I answered the stackoverlfow question. May I ask that you revise the post above with some of the suggestions in that post? So that we clear a bit the clutter and see what are things that don't work and what are things that can help us implement these calibration-related metrics and plots in mlr3proba. It would help me a lot!

Lee-xli commented 2 months ago

@bblodfon Thank you very much! I have posted an answer to the SO question. I should be able to construct the plot and ICI manually now post benchmark. I will now try to work on an example using cph and lasso, and start on a custom measure and get back to you.

Incidentally, I am running into errors during benchmarking random forest and mboost. The error message from rf says:

Error in finalizeData(fnames, newdata, na.action) : 
  no records in the NA-processed data: consider using 'na.action=na.impute'

and from mboost:

Error in solve.default(XtX, crossprod(X, y)) : 
  Lapack routine dgesv: system is exactly singular: U[4,4] = 0
This happened PipeOp surv.mboost.tuned's $train()
In addition: Warning messages:
1: In df2lambda(X, df = args$df, lambda = args$lambda, dmat = K, weights = w,  :
  ‘df’ too large:
  Degrees of freedom cannot be larger than the rank of the design matrix.
  Unpenalized base-learner with df = 3 used. Re-consider model specification.
This happened PipeOp surv.mboost.tuned's $train()
2: In df2lambda(X, df = args$df, lambda = args$lambda, dmat = K, weights = w,  :
  ‘df’ too large:
  Degrees of freedom cannot be larger than the rank of the design matrix.
  Unpenalized base-learner with df = 2 used. Re-consider model specification.
This happened PipeOp surv.mboost.tuned's $train()

I understand that this is potentially a different issue, thus I haven't included any reprex() output. Would you like me to start a new issue or SO question?

bblodfon commented 2 months ago

Hi Lee,

Thanks for writing the new answer post in the stackoverflow question, I added a comment for clarification. I would suggest to accept my answer and add the "solution" code to your original post for future reference.

I will now try to work on an example using cph and lasso, and start on a custom measure and get back to you.

Super, just edit the above post. Smaller and cleaner reprex examples are very helpful.

Incidentally, I am running into errors during benchmarking random forest and mboost

Separate issue yes! and since these learners are available from mlr3extralearners please post it there and mention me. It might be again a version thing or we need to update something. For random forest please use ranger and mboost it depends on the learner you use, the glmboost one was the most kind in my experience (less buggy in general and better memory footprint, vs blackboost and gamboost.

Lee-xli commented 1 month ago

Hi John, Please see the updated analysis above - hope it is concise and clear for you to follow. Please let me know if anything needs clarification. I will revisit my issues on other learners in the coming weeks, and post separate issues as you mentioned if still indicated. Thanks again for your assistance and advice - much appreciated.

bblodfon commented 1 month ago

Thanks Lee, I will take a look at the code and the papers, busy atm with other things. Just in case you don't know, we also have some calibration plots, I guess some of your code can go there, see https://mlr3proba.mlr-org.com/reference/autoplot.PredictionSurv.html

Lee-xli commented 1 month ago

No worries & Thank you John for pointing me to the existing calibration plots - I will have a look at the code there and see if I can imitate & improve the current R6 function.