Open Lee-xli opened 7 months ago
Hi @Lee-xli, thanks for sharing all of these! We will definitely like some new (calibration or other) survival measures based on recent literature for sure. I will take a look at the papers and see what we can do - and if you have some sample code that would be excellent. I will create a new issue for this.
Now for the stackoverflow question, the problem we face is that people don't use reprex::reprex()
and don't output library versions, ie it could be that the mlr3
version you used or mlr3proba
or mlr3extralearners
is a bit old. I suggest you post it up here with a reprex
+ library versions and I will definitely check it out.
Thank you very much @bblodfon ! I will work on these and get back to you soon!
custom measure ICI mlr3.R.zip UPDATED analysis
Hi John,
Thank you very much once again! Below is a simple worked example of implementing ICI and the smoothed calibration plot using hazard regression (hare function via the polspline package). The alternative method (per Austin et al, 2020) using restricted cubic splines may also be very easily implemented, but my impression is that its additional proportional hazard assumption and arbitrary choice of knots are some key drawbacks. The code below is largely based on the Austin, Harrell & Klaveren reference.
rm(list = (ls(all=T)))
library(distr6)
#>
#> Attaching package: 'distr6'
#> The following object is masked from 'package:stats':
#>
#> qqplot
#> The following object is masked from 'package:base':
#>
#> truncate
library(skimr)
library(mlr3)
library(mlr3learners)
library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3tuning)
#> Loading required package: paradox
library(mlr3proba)
library(mlr3misc)
#>
#> Attaching package: 'mlr3misc'
#> The following objects are masked from 'package:Hmisc':
#>
#> %nin%, capitalize
library(data.table)
library(polspline)
library(pec)
#> Loading required package: prodlim
# set up data
task_lung = tsk('lung')
d = task_lung$data()
d$time = ceiling(d$time/30.44)
task_lung = as_task_surv(d, time = 'time', event = 'status', id = 'lung')
po_encode = po('encode', method = 'treatment')
po_impute = po('imputelearner', lrn('regr.rpart'))
pre = po_encode %>>% po_impute
task = pre$train(task_lung)[[1]]
dt=task$data()
#### learners ----
cph=lrn("surv.coxph")
lasso_cv=as_learner(po("encode") %>>% lrn("surv.cv_glmnet", alpha=1))
# get baseline hazard estimates
comp.lasso = as_learner(ppl(
"distrcompositor",
learner = lasso_cv,
estimator = "kaplan",
form = "ph",
overwrite=T
))
# Benchmark
set.seed(1234)
BM1 = benchmark(benchmark_grid(task,
list(cph),
rsmp('cv', folds=4)),
store_models =T)
#> INFO [00:19:46.464] [mlr3] Running benchmark with 4 resampling iterations
#> INFO [00:19:46.569] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 1/4)
#> INFO [00:19:46.638] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 2/4)
#> INFO [00:19:46.689] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 3/4)
#> INFO [00:19:46.738] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 4/4)
#> INFO [00:19:46.788] [mlr3] Finished benchmark
### predictions ----
data = as.data.table(BM1)
nrow(data) # number of outer sample/split/test dataset
#> [1] 4
pred=t(data$prediction[[1]]$distr$cdf(12)) # event prob
test_1=dt[c(data$prediction[[1]]$row_ids)] # dataset for the outer layer (test set)
train_1=dt[-c(data$prediction[[1]]$row_ids)] # dataset for the inner layer (training set)
### hare ----
# add predicted prob and its cll to testing dataset
test_1[, pred.prob := pred][, pred.prob.cll := log(-log(1-pred.prob))]
cali.cox <- hare(data=test_1$time, delta = test_1$status,
cov = as.matrix(test_1$pred.prob.cll))
### plot ----
# use above hare model to get observed prob (over a range of predicted probability based on the cox model)
pred.grid = seq(quantile(test_1$pred.prob, probs=0.01),
quantile(test_1$pred.prob, prob=0.99),
length=100)
pred.grid.cll=log(-log(1-pred.grid))
pred.cali.cox=phare(12, pred.grid.cll, cali.cox) # predict observed event prob from HARE model
# plot data
cali=data.frame(cbind(predicted=pred.grid, observed=pred.cali.cox))
p <- ggplot(cali, aes(x=predicted, y=observed))+
geom_line() +
geom_abline(slope = 1, intercept=0, alpha=0.5, linetype='dashed') +
ylab('Observed prob of 1-year mortality') +
scale_y_continuous(limits=c(0,1)) +
xlab('Predicted prob of 1-year mortality') +
scale_x_continuous(limits = c(0,1)) +
theme_bw() + ggtitle('Lung - coxph (mlr3 benchmark)') +
theme(legend.position = 'top')
p
### ICI ----
# is equivalent to the MEAN difference between predicted model probabilities and observed probabilities derived from smooothed calibration curve
test_1[, ob.hare := phare(12, test_1$pred.prob.cll, cali.cox)][, abs.diff:= abs(ob.hare - pred.prob)]
ICI_cox=mean(test_1$abs.diff) ; ICI_cox
#> [1] 0.07991534
Created on 2024-05-10 by the reprex package (v2.0.1)
Above is just for the first outer loop, the entire analyses may be implemented with a locally written custom measure
rm(list = (ls(all=T)))
library(mlr3)
library(mlr3learners)
library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3tuning)
#> Loading required package: paradox
library(mlr3proba)
library(mlr3misc)
library(data.table)
library(polspline)
library(R6)
source('~/Flinders/Michael Sorich - Clinical trial analysis/Flinders/Michael Sorich - Tutorial/_indevelopment/calibration/custom measure ICI mlr3.R')
# set up data
task_lung = tsk('lung')
d = task_lung$data()
d$time = ceiling(d$time/30.44)
task_lung = as_task_surv(d, time = 'time', event = 'status', id = 'lung')
po_encode = po('encode', method = 'treatment')
po_impute = po('imputelearner', lrn('regr.rpart'))
pre = po_encode %>>% po_impute
task = pre$train(task_lung)[[1]]
dt=task$data()
#### learners ----
cph=lrn("surv.coxph")
lasso_cv=as_learner(po("encode") %>>% lrn("surv.cv_glmnet", alpha=1))
# get baseline hazard estimates
comp.lasso = as_learner(ppl(
"distrcompositor",
learner = lasso_cv,
estimator = "kaplan",
form = "ph",
overwrite=T
))
set.seed(1234)
BM1 = benchmark(benchmark_grid(task,
list(cph, comp.lasso),
rsmp('cv', folds=4)),
store_models =T)
#> INFO [00:35:24.750] [mlr3] Running benchmark with 8 resampling iterations
#> INFO [00:35:24.827] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 1/4)
#> INFO [00:35:24.900] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 2/4)
#> INFO [00:35:24.951] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 3/4)
#> INFO [00:35:25.000] [mlr3] Applying learner 'surv.coxph' on task 'lung' (iter 4/4)
#> INFO [00:35:25.060] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 1/4)
#> INFO [00:35:25.473] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 2/4)
#> INFO [00:35:25.794] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 3/4)
#> INFO [00:35:26.090] [mlr3] Applying learner 'distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose' on task 'lung' (iter 4/4)
#> INFO [00:35:26.379] [mlr3] Finished benchmark
# MeasureSurvICI$debug(".score")
# MeasureSurvICI$undebug(".score")
test=BM1$score(msr('surv.ICI', tm=12)) # default return ICI
test
#> nr task_id learner_id
#> <int> <char> <char>
#> 1: 1 lung surv.coxph
#> 2: 1 lung surv.coxph
#> 3: 1 lung surv.coxph
#> 4: 1 lung surv.coxph
#> 5: 2 lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 6: 2 lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 7: 2 lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> 8: 2 lung distrcompositor.kaplan.encode.surv.cv_glmnet.distrcompose
#> resampling_id iteration surv.ICI
#> <char> <int> <num>
#> 1: cv 1 0.07991534
#> 2: cv 2 0.08754933
#> 3: cv 3 0.12298290
#> 4: cv 4 0.20094110
#> 5: cv 1 0.02420689
#> 6: cv 2 0.06097302
#> 7: cv 3 0.10857322
#> 8: cv 4 0.03784889
#> Hidden columns: uhash, task, learner, resampling, prediction
test.plot=BM1$score(msr('surv.ICI', tm=12, plot=T)) # plot=T returns plots
#> Error in ggplot(cali, aes(x = predicted, y = observed)): could not find function "ggplot"
test.plot
#> Error in eval(expr, envir, enclos): object 'test.plot' not found
BM1$aggregate(msr('surv.ICI'))
#> Error in t.default(prediction$distr$cdf(tm)): argument is not a matrix
Created on 2024-05-10 by the reprex package (v2.0.1)
Please see the R6 function attached.
I am an absolute novice in object oriented programming and I achieved above from parrotlike learning. I am sure the R6 function has lots of room for improvement, and on top of that there are few other issues as shown in above reprex() output:
I will trial the custom measure on our research datasets in the coming week.
Looking forward to hearing back from you.
@Lee-xli I answered the stackoverlfow question. May I ask that you revise the post above with some of the suggestions in that post? So that we clear a bit the clutter and see what are things that don't work and what are things that can help us implement these calibration-related metrics and plots in mlr3proba
. It would help me a lot!
@bblodfon Thank you very much! I have posted an answer to the SO question. I should be able to construct the plot and ICI manually now post benchmark. I will now try to work on an example using cph and lasso, and start on a custom measure and get back to you.
Incidentally, I am running into errors during benchmarking random forest and mboost. The error message from rf says:
Error in finalizeData(fnames, newdata, na.action) :
no records in the NA-processed data: consider using 'na.action=na.impute'
and from mboost:
Error in solve.default(XtX, crossprod(X, y)) :
Lapack routine dgesv: system is exactly singular: U[4,4] = 0
This happened PipeOp surv.mboost.tuned's $train()
In addition: Warning messages:
1: In df2lambda(X, df = args$df, lambda = args$lambda, dmat = K, weights = w, :
‘df’ too large:
Degrees of freedom cannot be larger than the rank of the design matrix.
Unpenalized base-learner with df = 3 used. Re-consider model specification.
This happened PipeOp surv.mboost.tuned's $train()
2: In df2lambda(X, df = args$df, lambda = args$lambda, dmat = K, weights = w, :
‘df’ too large:
Degrees of freedom cannot be larger than the rank of the design matrix.
Unpenalized base-learner with df = 2 used. Re-consider model specification.
This happened PipeOp surv.mboost.tuned's $train()
I understand that this is potentially a different issue, thus I haven't included any reprex() output. Would you like me to start a new issue or SO question?
Hi Lee,
Thanks for writing the new answer post in the stackoverflow question, I added a comment for clarification. I would suggest to accept my answer and add the "solution" code to your original post for future reference.
I will now try to work on an example using cph and lasso, and start on a custom measure and get back to you.
Super, just edit the above post. Smaller and cleaner reprex
examples are very helpful.
Incidentally, I am running into errors during benchmarking random forest and mboost
Separate issue yes! and since these learners are available from mlr3extralearners
please post it there and mention me. It might be again a version thing or we need to update something. For random forest please use ranger
and mboost
it depends on the learner you use, the glmboost
one was the most kind in my experience (less buggy in general and better memory footprint, vs blackboost
and gamboost
.
Hi John, Please see the updated analysis above - hope it is concise and clear for you to follow. Please let me know if anything needs clarification. I will revisit my issues on other learners in the coming weeks, and post separate issues as you mentioned if still indicated. Thanks again for your assistance and advice - much appreciated.
Thanks Lee, I will take a look at the code and the papers, busy atm with other things. Just in case you don't know, we also have some calibration plots, I guess some of your code can go there, see https://mlr3proba.mlr-org.com/reference/autoplot.PredictionSurv.html
No worries & Thank you John for pointing me to the existing calibration plots - I will have a look at the code there and see if I can imitate & improve the current R6 function.
hi @Lee-xli , we've been working on some code restructing and implementation of other things in mlr3proba
. To speed this up a bit, can you do a PR with the new metric so that I check the code and see what is wrong and help you implement it more efficiently? I guess we need a new MeasureSurvCalibrationICI.R
+ some unit tests to see if results make sense.
Hi @bblodfon, Thank you very much. Please excuse my ignorance, but could you please provide me with some instruction on what PR involves? I have uploaded a zip file in May with the custom ICI function included. I suspect PR involves more than just uploading a file?
Search a bit online for more details but the gist is: PR => Pull Request, using git command line, you fork the repository, you create your own branch, you add the new files, you upload it to github, and online you can do a "PR" and I will see it in the Pull Requests.
Thanks @bblodfon, will give it a go next week.
Hi @bblodfon, sorry about the delay in getting the PR done. I encountered an issue that I am not sure how to solve. Please see attached screenshot of the code in terminal. I can't seem to pass the github authentication in the pushing step. I have created a keychain pass in github and entered that in my mac. Any suggestions on what I may try next is very much appreciated. NB: none of my colleagues have done PRs in the past so I don't have any other source of wisdom but you!)
You don't get denied access from what I am seeing but invalid usename or password - maybe you didn't type the password correctly? (which nowadays is a whole access token thing that probably you need to cache so that you don't write it everytime you git push/pull
)?
Thank you John, I just deleted and recreated a new personal token and it worked. A pull request has just been submitted. Thanks again for guiding me through this. :)
Hi mlr3proba team,
Some recent recommendations on evaluation/validation of survival prediction models call for estimation of individuals' observed outcome probabilities and the model predicted probabilities (at given time points), such as the smoothed calibration curves and the integrated calibration index (ICI) (Austin et al, 2020; Riley et al, 2024).
I have successfully computed the calibration index and constructed the plots after training models in mlr3, however, going forward, I would like to implement ICI as a custom measure during benchmark procedures and be able to construct calibration curve directly after benchmark procedures. In my attempts implementing these, I have encountered some difficulties extracting relevant predictions from outer loops of nested cv benchmark procedures, and I have opened up a question on StackOverflow (https://stackoverflow.com/questions/78364286/how-to-extract-predictions-say-of-survival-probability-of-the-outer-loop-sampl). And I wonder if mlr3proba may already have something similar in the pipeline for implementation.
I would very much appreciate any advice and guidance from mlr3proba team directly.
References: Austin PC, Harrell FE Jr, van Klaveren D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat Med. 2020 Sep 20;39(21):2714-2742. doi: 10.1002/sim.8570. Epub 2020 Jun 16. PMID: 32548928; PMCID: PMC7497089. Riley RD, Archer L, Snell KIE, Ensor J, Dhiman P, Martin GP, Bonnett LJ, Collins GS (2024) Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ. 384: e074820. doi: 10.1136/bmj-2023-074820