kogalur / randomForestSRC

DOCUMENTATION:
https://www.randomforestsrc.org/
GNU General Public License v3.0
115 stars 18 forks source link

subscript out of bounds error with get.brier.survival() #262

Open mao223 opened 2 years ago

mao223 commented 2 years ago

Good day,

I was wondering if I could get help with an error with one of the randomForestSRC functions. I am trying to plot Brier error over time as specified in the tutorial. When specifying cens.model = "rfsrc" in get.brier.survival(), I am getting the error

Error in cens.dist[, i] : subscript out of bounds

To determine if it was a problem with my input data, I used a different dataset: the pbc data that is part of the package. I was able to recapitulate the error above.

## load data
data("pbc", package = "randomForestSRC")

## Create the trial and test data sets
pbc.trial = pbc %>% filter(!is.na(treatment))
pbc.test = pbc %>% filter(is.na(treatment))

## Train model
set.seed(1)
rfsrc_pbc = rfsrc(Surv(days, status) ~ .,
                   data = pbc.trial,
                   nsplit = 10,
                   na.action = "na.impute")

Which seems to have ran correctly:

 Sample size: 312
                    Number of deaths: 125
                    Was data imputed: yes
                     Number of trees: 500
           Forest terminal node size: 15
       Average no. of terminal nodes: 15.352
No. of variables tried at each split: 5
              Total no. of variables: 17
       Resampling used to grow trees: swor
    Resample size used to grow trees: 197
                            Analysis: RSF
                              Family: surv
                      Splitting rule: logrank *random*
       Number of random split points: 10
                          (OOB) CRPS: 0.12251812
   (OOB) Requested performance error: 0.1715

Extracting the brier scores:

## get brier score
km.brier = get.brier.survival(rfsrc_pbc, cens.model = "km")$brier.score
rfsrc.brier = get.brier.survival(rfsrc_pbc, cens.model = "rfsrc")$brier.score

Using the km censoring distribution (km.brier) worked, returning:

head(km.brier, n = 20)
   time brier.score
1    41 0.003291957
2    51 0.006223498
3    71 0.009362284
4    77 0.011880171
5   110 0.014697370
6   130 0.016798135
7   131 0.018919906
8   140 0.021190616
9   179 0.022736619
10  186 0.025818265
11  191 0.027465872
12  198 0.030855710
13  207 0.033368961
14  216 0.035316923
15  223 0.036228441
16  264 0.038315335
17  304 0.040009279
18  321 0.042411043
19  326 0.044632263
20  334 0.044713808

Using random forest censoring distribution (rfsrc.brier) did not; the error returned:

Error in cens.dist[, i] : subscript out of bounds

May I get some guidance on how to fix the error when a random forest censoring distribution is used? Thank you very much for taking the time to read my message.

ishwaran commented 2 years ago

Hi, thanks for your post. This has alerted us to a bug in the code related to missing values.

We will provide a fix in the next CRAN release, in the meantime there are two workarounds:

  1. Do not use na.action="na.impute" when using cens.model="rfsrc"
  2. Add the following after your rfsrc_pbc call: rfsrc_pbc$forest$xvar[rfsrc_pbc$imputed.indv, ] <- rfsrc_pbc$imputed.data[, -(1:2)]
mao223 commented 2 years ago

Thank you very much!