mlr-org / mlr3proba

Probabilistic Learning for mlr3
https://mlr3proba.mlr-org.com/
GNU Lesser General Public License v3.0
130 stars 20 forks source link

Error using `p_max` parameter of the Graf score #382

Closed vlegoff closed 6 months ago

vlegoff commented 6 months ago

Hello,

I am currently running a survival benchmark using mlr3proba and when computing the Graf score with p_max=0.9, I have the following error on the smallest of my datasets (35 individuals):

Error in if (any(mtc==0)) { : missing value where TRUE/FALSE needed

From my explorations, it seems to happen after a certain threshold when augmenting p_max value (in my case this threshold seems to be close to 0.75), but I also observed it on larger datasets, with higher thresholds (.99 with n=419). As a side effects, this prevents from using p_max=1, which I believe should run even if it is a bit absurd.

I replicated the error using the following code:

library(data.table)
library(mlr3verse)
library(mlr3proba)

# Error on smallest dataset
dat = fread("toy_example.csv")
tsk = as_task_surv(dat, event="status", time="time")
tsk$add_strata("status")

learner = lrn("surv.coxph")
cv = rsmp("repeated_cv", folds=5, repeats=10)
set.seed(42)
rr = resample(tsk, learner, cv)
rr$score(msr("surv.graf", p_max=.75))

# Error on larger dataset
dat2 = fwrite(dat, "toy_example_larger.csv")
dat2 = fread("toy_example_larger.csv")
tsk2 = as_task_surv(dat, event="status", time="time")
tsk2$add_strata("status")

cv2 = rsmp("repeated_cv", folds=5, repeats=10)
set.seed(42)
rr2 = resample(tsk2, learner, cv2)

rr2$score(msr("surv.graf", p_max=.99))

toy_example.csv toy_example_larger.csv

Thanks for reading this, I hope I haven't misunderstood the doc while trying to use p_max

Edit: I am using R 4.3.2 and mlr3proba 0.6.0

vlegoff commented 6 months ago

While checking the versions I used, I found the source of the problem, I was using the previous version of mlr3proba (0.6.0) and the bug is solved in the current version. Sorry for opening this issue.