Implement Integrated Calibration Index (ICI) as a performance measure and construct smoothed calibration curves post benchmark procedures

Lee-xli commented 7 months ago

Hi mlr3proba team,

Some recent recommendations on evaluation/validation of survival prediction models call for estimation of individuals' observed outcome probabilities and the model predicted probabilities (at given time points), such as the smoothed calibration curves and the integrated calibration index (ICI) (Austin et al, 2020; Riley et al, 2024).

I have successfully computed the calibration index and constructed the plots after training models in mlr3, however, going forward, I would like to implement ICI as a custom measure during benchmark procedures and be able to construct calibration curve directly after benchmark procedures. In my attempts implementing these, I have encountered some difficulties extracting relevant predictions from outer loops of nested cv benchmark procedures, and I have opened up a question on StackOverflow (https://stackoverflow.com/questions/78364286/how-to-extract-predictions-say-of-survival-probability-of-the-outer-loop-sampl). And I wonder if mlr3proba may already have something similar in the pipeline for implementation.

I would very much appreciate any advice and guidance from mlr3proba team directly.

References: Austin PC, Harrell FE Jr, van Klaveren D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat Med. 2020 Sep 20;39(21):2714-2742. doi: 10.1002/sim.8570. Epub 2020 Jun 16. PMID: 32548928; PMCID: PMC7497089. Riley RD, Archer L, Snell KIE, Ensor J, Dhiman P, Martin GP, Bonnett LJ, Collins GS (2024) Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ. 384: e074820. doi: 10.1136/bmj-2023-074820

bblodfon commented 7 months ago

Hi @Lee-xli, thanks for sharing all of these! We will definitely like some new (calibration or other) survival measures based on recent literature for sure. I will take a look at the papers and see what we can do - and if you have some sample code that would be excellent. I will create a new issue for this.

Now for the stackoverflow question, the problem we face is that people don't use reprex::reprex() and don't output library versions, ie it could be that the mlr3 version you used or mlr3proba or mlr3extralearners is a bit old. I suggest you post it up here with a reprex + library versions and I will definitely check it out.

Lee-xli commented 6 months ago

Thank you very much @bblodfon ! I will work on these and get back to you soon!

Lee-xli commented 6 months ago

custom measure ICI mlr3.R.zip UPDATED analysis

Hi John,

Thank you very much once again! Below is a simple worked example of implementing ICI and the smoothed calibration plot using hazard regression (hare function via the polspline package). The alternative method (per Austin et al, 2020) using restricted cubic splines may also be very easily implemented, but my impression is that its additional proportional hazard assumption and arbitrary choice of knots are some key drawbacks. The code below is largely based on the Austin, Harrell & Klaveren reference.

rm(list = (ls(all=T)))
library(distr6)
#> 
#> Attaching package: 'distr6'
#> The following object is masked from 'package:stats':
#> 
#>     qqplot
#> The following object is masked from 'package:base':
#> 
#>     truncate
library(skimr)
library(mlr3)
library(mlr3learners)
library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3tuning)
#> Loading required package: paradox
library(mlr3proba)
library(mlr3misc)
#> 
#> Attaching package: 'mlr3misc'
#> The following objects are masked from 'package:Hmisc':
#> 
#>     %nin%, capitalize
library(data.table)
library(polspline)
library(pec)
#> Loading required package: prodlim

# set up data
task_lung = tsk('lung')
d = task_lung$data()
d$time = ceiling(d$time/30.44)
task_lung = as_task_surv(d, time = 'time', event = 'status', id = 'lung')
po_encode = po('encode', method = 'treatment')
po_impute = po('imputelearner', lrn('regr.rpart'))
pre = po_encode %>>% po_impute
task = pre$train(task_lung)[[1]]
dt=task$data()

#### learners ----
cph=lrn("surv.coxph")
lasso_cv=as_learner(po("encode") %>>% lrn("surv.cv_glmnet", alpha=1))

# get baseline hazard estimates
comp.lasso = as_learner(ppl(  
  "distrcompositor",
  learner = lasso_cv,
  estimator = "kaplan",
  form = "ph",
  overwrite=T
))

# Benchmark 
set.seed(1234)
BM1 = benchmark(benchmark_grid(task,
                               list(cph),
                               rsmp('cv', folds=4)),
                store_models =T)
#> INFO  [00:19:46.464] [mlr3] Running benchmark with 4 resampling iterations
#> INFO  [00:19:46.569] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 1/4)
#> INFO  [00:19:46.638] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 2/4)
#> INFO  [00:19:46.689] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 3/4)
#> INFO  [00:19:46.738] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 4/4)
#> INFO  [00:19:46.788] [mlr3] Finished benchmark

### predictions ----
data = as.data.table(BM1) 
nrow(data) # number of outer sample/split/test dataset
#> [1] 4
pred=t(data$prediction[[1]]$distr$cdf(12)) # event prob

test_1=dt[c(data$prediction[[1]]$row_ids)]  # dataset for the outer layer (test set)
train_1=dt[-c(data$prediction[[1]]$row_ids)] # dataset for the inner layer (training set)

### hare ----
# add predicted prob and its cll to testing dataset
test_1[, pred.prob := pred][, pred.prob.cll := log(-log(1-pred.prob))]
cali.cox <- hare(data=test_1$time, delta = test_1$status, 
                 cov = as.matrix(test_1$pred.prob.cll))

### plot ----
# use above hare model to get observed prob (over a range of predicted probability based on the cox model)
pred.grid = seq(quantile(test_1$pred.prob, probs=0.01),
                quantile(test_1$pred.prob, prob=0.99),
                length=100)
pred.grid.cll=log(-log(1-pred.grid))

pred.cali.cox=phare(12, pred.grid.cll, cali.cox) # predict observed event prob from HARE model

# plot data
cali=data.frame(cbind(predicted=pred.grid, observed=pred.cali.cox))
p <- ggplot(cali, aes(x=predicted, y=observed))+
  geom_line() +
  geom_abline(slope = 1, intercept=0, alpha=0.5, linetype='dashed') +
  ylab('Observed prob of 1-year mortality') +
  scale_y_continuous(limits=c(0,1)) +
  xlab('Predicted prob of 1-year mortality') +
  scale_x_continuous(limits = c(0,1)) +
  theme_bw() + ggtitle('Lung - coxph (mlr3 benchmark)') +
  theme(legend.position = 'top')
p


### ICI ---- 
# is equivalent to the MEAN difference between predicted model probabilities and observed probabilities derived from smooothed calibration curve
test_1[, ob.hare := phare(12, test_1$pred.prob.cll, cali.cox)][, abs.diff:= abs(ob.hare - pred.prob)]
ICI_cox=mean(test_1$abs.diff) ; ICI_cox
#> [1] 0.07991534

^{Created on 2024-05-10 by the reprex package (v2.0.1)}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 (2021-08-10) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Adelaide #> date 2024-05-10 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> backports 1.4.1 2021-12-13 [1] #> base64enc 0.1-3 2015-07-28 [2] #> bbotk 0.8.0 2024-02-29 [1] #> checkmate 2.3.1 2023-12-04 [1] #> cli 3.6.2 2023-12-11 [1] #> cluster 2.1.2 2021-04-17 [2] #> codetools 0.2-18 2020-11-04 [2] #> colorspace 2.1-0 2023-01-23 [1] #> conquer 1.0.2 2020-08-27 [2] #> crayon 1.4.1 2021-02-08 [2] #> curl 4.3.2 2021-06-23 [2] #> data.table * 1.15.4 2024-03-30 [1] #> dictionar6 0.1.3 2021-09-13 [1] #> digest 0.6.35 2024-03-11 [1] #> distr6 * 1.8.4 2024-05-02 [1] #> dplyr 1.1.3 2023-09-03 [1] #> evaluate 0.23 2023-11-01 [1] #> fansi 1.0.6 2023-12-08 [1] #> farver 2.1.1 2022-07-06 [1] #> fastmap 1.1.0 2021-01-25 [2] #> foreach 1.5.1 2020-10-15 [2] #> foreign 0.8-81 2020-12-22 [2] #> Formula * 1.2-4 2020-10-16 [2] #> fs 1.5.0 2020-07-31 [2] #> future 1.33.2 2024-03-26 [1] #> future.apply 1.11.2 2024-03-28 [1] #> generics 0.1.3 2022-07-05 [1] #> ggplot2 * 3.5.1 2024-04-23 [1] #> globals 0.16.3 2024-03-08 [1] #> glue 1.7.0 2024-01-09 [1] #> gridExtra 2.3 2017-09-09 [2] #> gtable 0.3.5 2024-04-22 [1] #> highr 0.9 2021-04-16 [2] #> Hmisc * 4.5-0 2021-02-28 [2] #> htmlTable 2.2.1 2021-05-18 [2] #> htmltools 0.5.6 2023-08-10 [1] #> htmlwidgets 1.5.3 2020-12-10 [2] #> httr 1.4.2 2020-07-20 [2] #> iterators 1.0.13 2020-10-15 [2] #> jpeg 0.1-9 2021-07-24 [2] #> knitr 1.33 2021-04-24 [2] #> labeling 0.4.3 2023-08-29 [1] #> lattice * 0.20-44 2021-05-02 [2] #> latticeExtra 0.6-29 2019-12-19 [2] #> lava 1.6.9 2021-03-11 [2] #> lgr 0.4.4 2022-09-05 [1] #> lifecycle 1.0.4 2023-11-07 [1] #> listenv 0.9.1 2024-01-29 [1] #> magrittr 2.0.3 2022-03-30 [1] #> MASS 7.3-54 2021-05-03 [2] #> Matrix 1.3-4 2021-06-01 [2] #> MatrixModels 0.5-0 2021-03-02 [2] #> matrixStats 0.60.0 2021-07-26 [2] #> mime 0.11 2021-06-23 [2] #> mlr3 * 0.19.0 2024-04-24 [1] #> mlr3extralearners * 0.7.1 2023-11-24 [1] #> mlr3learners * 0.6.0 2024-03-13 [1] #> mlr3misc * 0.15.0 2024-04-10 [1] #> mlr3pipelines * 0.5.2 2024-04-23 [1] #> mlr3proba * 0.6.1 2024-05-02 [1] #> mlr3tuning * 0.20.0 2024-03-05 [1] #> mlr3viz 0.8.0 2024-03-05 [1] #> multcomp 1.4-17 2021-04-29 [2] #> munsell 0.5.1 2024-04-01 [1] #> mvtnorm 1.1-2 2021-06-07 [2] #> nlme 3.1-152 2021-02-04 [2] #> nnet 7.3-16 2021-05-03 [2] #> numDeriv 2016.8-1.1 2019-06-06 [2] #> ooplah 0.2.0 2022-01-21 [1] #> palmerpenguins 0.1.1 2022-08-15 [1] #> pander 0.6.4 2021-06-13 [2] #> paradox * 0.11.1-9000 2023-11-22 [1] #> parallelly 1.37.1 2024-02-29 [1] #> param6 0.2.4 2023-11-22 [1] #> pec * 2020.11.17 2020-11-16 [2] #> pillar 1.9.0 2023-03-22 [1] #> pkgconfig 2.0.3 2019-09-22 [2] #> png 0.1-7 2013-12-03 [2] #> polspline * 1.1.19 2020-05-15 [2] #> prodlim * 2019.11.13 2019-11-17 [2] #> quantreg 5.86 2021-06-06 [2] #> R6 2.5.1 2021-08-19 [1] #> RColorBrewer 1.1-3 2022-04-03 [1] #> Rcpp 1.0.12 2024-01-09 [1] #> reprex 2.0.1 2021-08-05 [1] #> RhpcBLASctl 0.23-42 2023-02-11 [1] #> rlang 1.1.3 2024-01-10 [1] #> rmarkdown 2.10 2021-08-06 [2] #> rms * 6.2-0 2021-03-18 [2] #> rpart 4.1-15 2019-04-12 [2] #> rstudioapi 0.15.0 2023-07-07 [1] #> sandwich 3.0-1 2021-05-18 [2] #> scales 1.3.0 2023-11-28 [1] #> sessioninfo 1.1.1 2018-11-05 [2] #> set6 0.2.6 2023-11-22 [1] #> skimr * 1.0.2 2021-08-13 [2] #> SparseM * 1.81 2021-02-18 [2] #> stringi 1.7.3 2021-07-16 [2] #> stringr 1.5.0 2022-12-02 [1] #> survival * 3.5-7 2023-08-14 [1] #> TH.data 1.0-10 2019-01-21 [2] #> tibble 3.2.1 2023-03-20 [1] #> tidyselect 1.2.0 2022-10-10 [1] #> timereg 2.0.0 2021-05-20 [2] #> utf8 1.2.4 2023-10-22 [1] #> uuid 1.2-0 2024-01-14 [1] #> vctrs 0.6.5 2023-12-01 [1] #> withr 3.0.0 2024-01-16 [1] #> xfun 0.25 2021-08-06 [2] #> xml2 1.3.2 2020-04-23 [2] #> yaml 2.2.1 2020-02-01 [2] #> zoo 1.8-9 2021-03-09 [2] #> source #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> Github (xoopR/distr6@a7c01f7) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> Github (mlr-org/mlr3extralearners@6e2af9e) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> https://mlr-org.r-universe.dev (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> https://mlr-org.r-universe.dev (R 4.1.1) #> CRAN (R 4.1.1) #> Github (xoopR/param6@0fa3577) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> Github (xoopR/set6@a901255) #> Github (jeremyrcoyle/skimr@a1c792d) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> #> [1] /Users/Lee/Library/R/x86_64/4.1/library #> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```

Above is just for the first outer loop, the entire analyses may be implemented with a locally written custom measure

rm(list = (ls(all=T)))
library(mlr3)
library(mlr3learners)
library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3tuning)
#> Loading required package: paradox
library(mlr3proba)
library(mlr3misc)
library(data.table)
library(polspline)
library(R6)
source('~/Flinders/Michael Sorich - Clinical trial analysis/Flinders/Michael Sorich - Tutorial/_indevelopment/calibration/custom measure ICI mlr3.R')

# set up data
task_lung = tsk('lung')
d = task_lung$data()
d$time = ceiling(d$time/30.44)
task_lung = as_task_surv(d, time = 'time', event = 'status', id = 'lung')
po_encode = po('encode', method = 'treatment')
po_impute = po('imputelearner', lrn('regr.rpart'))
pre = po_encode %>>% po_impute
task = pre$train(task_lung)[[1]]
dt=task$data()

#### learners ----
cph=lrn("surv.coxph")
lasso_cv=as_learner(po("encode") %>>% lrn("surv.cv_glmnet", alpha=1))

# get baseline hazard estimates
comp.lasso = as_learner(ppl(  
  "distrcompositor",
  learner = lasso_cv,
  estimator = "kaplan",
  form = "ph",
  overwrite=T
))

set.seed(1234)
BM1 = benchmark(benchmark_grid(task,
                               list(cph, comp.lasso),
                               rsmp('cv', folds=4)),
                store_models =T)
#> INFO  [00:35:24.750] [mlr3] Running benchmark with 8 resampling iterations
#> INFO  [00:35:24.827] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 1/4)
#> INFO  [00:35:24.900] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 2/4)
#> INFO  [00:35:24.951] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 3/4)
#> INFO  [00:35:25.000] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 4/4)
#> INFO  [00:35:25.060] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 1/4)
#> INFO  [00:35:25.473] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 2/4)
#> INFO  [00:35:25.794] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 3/4)
#> INFO  [00:35:26.090] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 4/4)
#> INFO  [00:35:26.379] [mlr3] Finished benchmark

 # MeasureSurvICI$debug(".score")
# MeasureSurvICI$undebug(".score")
test=BM1$score(msr('surv.ICI', tm=12))  # default return ICI
test
#>       nr task_id                                                learner_id
#>    <int>  <char>                                                    <char>
#> 1:     1    lung                                                surv.coxph
#> 2:     1    lung                                                surv.coxph
#> 3:     1    lung                                                surv.coxph
#> 4:     1    lung                                                surv.coxph
#> 5:     2    lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 6:     2    lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 7:     2    lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 8:     2    lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#>    resampling_id iteration   surv.ICI
#>           <char>     <int>      <num>
#> 1:            cv         1 0.07991534
#> 2:            cv         2 0.08754933
#> 3:            cv         3 0.12298290
#> 4:            cv         4 0.20094110
#> 5:            cv         1 0.02420689
#> 6:            cv         2 0.06097302
#> 7:            cv         3 0.10857322
#> 8:            cv         4 0.03784889
#> Hidden columns: uhash, task, learner, resampling, prediction
test.plot=BM1$score(msr('surv.ICI', tm=12, plot=T)) # plot=T returns plots
#> Error in ggplot(cali, aes(x = predicted, y = observed)): could not find function "ggplot"
test.plot
#> Error in eval(expr, envir, enclos): object 'test.plot' not found
BM1$aggregate(msr('surv.ICI')) 
#> Error in t.default(prediction$distr$cdf(tm)): argument is not a matrix

^{Created on 2024-05-10 by the reprex package (v2.0.1)}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 (2021-08-10) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Adelaide #> date 2024-05-10 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> backports 1.4.1 2021-12-13 [1] #> bbotk 0.8.0 2024-02-29 [1] #> checkmate 2.3.1 2023-12-04 [1] #> cli 3.6.2 2023-12-11 [1] #> codetools 0.2-18 2020-11-04 [2] #> colorspace 2.1-0 2023-01-23 [1] #> crayon 1.4.1 2021-02-08 [2] #> data.table * 1.15.4 2024-03-30 [1] #> dictionar6 0.1.3 2021-09-13 [1] #> digest 0.6.35 2024-03-11 [1] #> distr6 1.8.4 2024-05-02 [1] #> dplyr 1.1.3 2023-09-03 [1] #> evaluate 0.23 2023-11-01 [1] #> fansi 1.0.6 2023-12-08 [1] #> fastmap 1.1.0 2021-01-25 [2] #> foreach 1.5.1 2020-10-15 [2] #> fs 1.5.0 2020-07-31 [2] #> future 1.33.2 2024-03-26 [1] #> future.apply 1.11.2 2024-03-28 [1] #> generics 0.1.3 2022-07-05 [1] #> ggplot2 3.5.1 2024-04-23 [1] #> glmnet 4.1-2 2021-06-24 [2] #> globals 0.16.3 2024-03-08 [1] #> glue 1.7.0 2024-01-09 [1] #> gtable 0.3.5 2024-04-22 [1] #> highr 0.9 2021-04-16 [2] #> htmltools 0.5.6 2023-08-10 [1] #> iterators 1.0.13 2020-10-15 [2] #> knitr 1.33 2021-04-24 [2] #> lattice 0.20-44 2021-05-02 [2] #> lgr 0.4.4 2022-09-05 [1] #> lifecycle 1.0.4 2023-11-07 [1] #> listenv 0.9.1 2024-01-29 [1] #> magrittr 2.0.3 2022-03-30 [1] #> Matrix 1.3-4 2021-06-01 [2] #> mlr3 * 0.19.0 2024-04-24 [1] #> mlr3extralearners * 0.7.1 2023-11-24 [1] #> mlr3learners * 0.6.0 2024-03-13 [1] #> mlr3misc * 0.15.0 2024-04-10 [1] #> mlr3pipelines * 0.5.2 2024-04-23 [1] #> mlr3proba * 0.6.1 2024-05-02 [1] #> mlr3tuning * 0.20.0 2024-03-05 [1] #> mlr3viz 0.8.0 2024-03-05 [1] #> munsell 0.5.1 2024-04-01 [1] #> ooplah 0.2.0 2022-01-21 [1] #> palmerpenguins 0.1.1 2022-08-15 [1] #> paradox * 0.11.1-9000 2023-11-22 [1] #> parallelly 1.37.1 2024-02-29 [1] #> param6 0.2.4 2023-11-22 [1] #> pillar 1.9.0 2023-03-22 [1] #> pkgconfig 2.0.3 2019-09-22 [2] #> polspline * 1.1.19 2020-05-15 [2] #> R6 * 2.5.1 2021-08-19 [1] #> Rcpp 1.0.12 2024-01-09 [1] #> reprex 2.0.1 2021-08-05 [1] #> RhpcBLASctl 0.23-42 2023-02-11 [1] #> rlang 1.1.3 2024-01-10 [1] #> rmarkdown 2.10 2021-08-06 [2] #> rpart 4.1-15 2019-04-12 [2] #> rstudioapi 0.15.0 2023-07-07 [1] #> scales 1.3.0 2023-11-28 [1] #> sessioninfo 1.1.1 2018-11-05 [2] #> set6 0.2.6 2023-11-22 [1] #> shape 1.4.6 2021-05-19 [2] #> stringi 1.7.3 2021-07-16 [2] #> stringr 1.5.0 2022-12-02 [1] #> survival 3.5-7 2023-08-14 [1] #> survivalmodels 0.1.191 2024-03-19 [1] #> tibble 3.2.1 2023-03-20 [1] #> tidyselect 1.2.0 2022-10-10 [1] #> utf8 1.2.4 2023-10-22 [1] #> uuid 1.2-0 2024-01-14 [1] #> vctrs 0.6.5 2023-12-01 [1] #> withr 3.0.0 2024-01-16 [1] #> xfun 0.25 2021-08-06 [2] #> yaml 2.2.1 2020-02-01 [2] #> source #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> Github (xoopR/distr6@a7c01f7) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> Github (mlr-org/mlr3extralearners@6e2af9e) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> https://mlr-org.r-universe.dev (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.2) #> https://mlr-org.r-universe.dev (R 4.1.1) #> CRAN (R 4.1.1) #> Github (xoopR/param6@0fa3577) #> CRAN (R 4.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> Github (xoopR/set6@a901255) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.2) #> CRAN (R 4.1.2) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> #> [1] /Users/Lee/Library/R/x86_64/4.1/library #> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```

Please see the R6 function attached.

I am an absolute novice in object oriented programming and I achieved above from parrotlike learning. I am sure the R6 function has lots of room for improvement, and on top of that there are few other issues as shown in above reprex() output:

$aggregate is not working for some reasons.
The plots did get drawn from executing the code despite the error message. I think it would be useful to output both well-labeled plots and plot data, both of which I failed to achieve. With the plots, my main issue is that I am not sure how to extract the index of the subsample (i.e. "1st", "2nd" etc fold of "cph" learner). Once the index and learner information is extracted, a proper graph title can be added. With the plot data, I think the issue is that I am not sure how to finally output data frames. It seems only a single numerical measure may be returned.

I will trial the custom measure on our research datasets in the coming week.

Looking forward to hearing back from you.

bblodfon commented 6 months ago

@Lee-xli I answered the stackoverlfow question. May I ask that you revise the post above with some of the suggestions in that post? So that we clear a bit the clutter and see what are things that don't work and what are things that can help us implement these calibration-related metrics and plots in mlr3proba. It would help me a lot!

Lee-xli commented 6 months ago

@bblodfon Thank you very much! I have posted an answer to the SO question. I should be able to construct the plot and ICI manually now post benchmark. I will now try to work on an example using cph and lasso, and start on a custom measure and get back to you.

Incidentally, I am running into errors during benchmarking random forest and mboost. The error message from rf says:

Error in finalizeData(fnames, newdata, na.action) : 
  no records in the NA-processed data: consider using 'na.action=na.impute'

and from mboost:

Error in solve.default(XtX, crossprod(X, y)) : 
  Lapack routine dgesv: system is exactly singular: U[4,4] = 0
This happened PipeOp surv.mboost.tuned's $train()
In addition: Warning messages:
1: In df2lambda(X, df = args$df, lambda = args$lambda, dmat = K, weights = w,  :
  ‘df’ too large:
  Degrees of freedom cannot be larger than the rank of the design matrix.
  Unpenalized base-learner with df = 3 used. Re-consider model specification.
This happened PipeOp surv.mboost.tuned's $train()
2: In df2lambda(X, df = args$df, lambda = args$lambda, dmat = K, weights = w,  :
  ‘df’ too large:
  Degrees of freedom cannot be larger than the rank of the design matrix.
  Unpenalized base-learner with df = 2 used. Re-consider model specification.
This happened PipeOp surv.mboost.tuned's $train()

I understand that this is potentially a different issue, thus I haven't included any reprex() output. Would you like me to start a new issue or SO question?

bblodfon commented 6 months ago

Hi Lee,

Thanks for writing the new answer post in the stackoverflow question, I added a comment for clarification. I would suggest to accept my answer and add the "solution" code to your original post for future reference.

I will now try to work on an example using cph and lasso, and start on a custom measure and get back to you.

Super, just edit the above post. Smaller and cleaner reprex examples are very helpful.

Incidentally, I am running into errors during benchmarking random forest and mboost

Separate issue yes! and since these learners are available from mlr3extralearners please post it there and mention me. It might be again a version thing or we need to update something. For random forest please use ranger and mboost it depends on the learner you use, the glmboost one was the most kind in my experience (less buggy in general and better memory footprint, vs blackboost and gamboost.

Lee-xli commented 6 months ago

Hi John, Please see the updated analysis above - hope it is concise and clear for you to follow. Please let me know if anything needs clarification. I will revisit my issues on other learners in the coming weeks, and post separate issues as you mentioned if still indicated. Thanks again for your assistance and advice - much appreciated.

bblodfon commented 6 months ago

Thanks Lee, I will take a look at the code and the papers, busy atm with other things. Just in case you don't know, we also have some calibration plots, I guess some of your code can go there, see https://mlr3proba.mlr-org.com/reference/autoplot.PredictionSurv.html

Lee-xli commented 6 months ago

No worries & Thank you John for pointing me to the existing calibration plots - I will have a look at the code there and see if I can imitate & improve the current R6 function.

bblodfon commented 3 months ago

hi @Lee-xli , we've been working on some code restructing and implementation of other things in mlr3proba. To speed this up a bit, can you do a PR with the new metric so that I check the code and see what is wrong and help you implement it more efficiently? I guess we need a new MeasureSurvCalibrationICI.R + some unit tests to see if results make sense.

Lee-xli commented 3 months ago

Hi @bblodfon, Thank you very much. Please excuse my ignorance, but could you please provide me with some instruction on what PR involves? I have uploaded a zip file in May with the custom ICI function included. I suspect PR involves more than just uploading a file?

bblodfon commented 3 months ago

Search a bit online for more details but the gist is: PR => Pull Request, using git command line, you fork the repository, you create your own branch, you add the new files, you upload it to github, and online you can do a "PR" and I will see it in the Pull Requests.

Lee-xli commented 3 months ago

Thanks @bblodfon, will give it a go next week.

Lee-xli commented 2 months ago

Hi @bblodfon, sorry about the delay in getting the PR done. I encountered an issue that I am not sure how to solve. Please see attached screenshot of the code in terminal. I can't seem to pass the github authentication in the pushing step. I have created a keychain pass in github and entered that in my mac. Any suggestions on what I may try next is very much appreciated. NB: none of my colleagues have done PRs in the past so I don't have any other source of wisdom but you!) Screen Shot 2024-09-02 at 5 30 02 pm

bblodfon commented 2 months ago

You don't get denied access from what I am seeing but invalid usename or password - maybe you didn't type the password correctly? (which nowadays is a whole access token thing that probably you need to cache so that you don't write it everytime you git push/pull)?

Lee-xli commented 2 months ago

Thank you John, I just deleted and recreated a new personal token and it worked. A pull request has just been submitted. Thanks again for guiding me through this. :)

mlr-org / mlr3proba

Implement Integrated Calibration Index (ICI) as a performance measure and construct smoothed calibration curves post benchmark procedures #378