kogalur / randomForestSRC

DOCUMENTATION:
https://www.randomforestsrc.org/
GNU General Public License v3.0
113 stars 18 forks source link

Request: subsample use parallel #407

Open erikerhardt opened 6 months ago

erikerhardt commented 6 months ago

Subsampling can take a long time. Would you be willing to try an implementation that runs in parallel by replacing lapply() with parallel::mclapply()?

https://github.com/kogalur/randomForestSRC/blob/master/src/main/resources/cran/R/subsample.rfsrc.R B = 100

subsampling loop for calculating VIMP confidence regions

vmpS <- lapply(1:B, function(b) { ... }

I love randomForestSRC! It has made a world of difference in my predictive work. Thank you!

ishwaran commented 6 months ago

Doing this will cause forking issues. The underlying code uses OpenMP parallel processing so if you wrap that in an mclapply you will have an issue with threading and the code will segfault.

Subsampling is typically very fast. Try reducing the subsample size and/or reduce the number of iterations.