philchalmers / SimDesign

Structure for organizing Monte Carlo simulations in R
http://philchalmers.github.io/SimDesign/
61 stars 17 forks source link

Random number seed management #31

Closed mronkko closed 5 months ago

mronkko commented 5 months ago

The page about cluster computation should discuss random number management more explicly.

https://cran.r-project.org/web/packages/SimDesign/vignettes/Parallel-computing.html

For example, the section "3 Poor man’s cluster computing for independent nodes" suggests running R=200 and R=300 on two computers. This can easily lead to running with the same random number seeds causing the replications to be non-independent. It would be useful to provide some recommendations on how seeds should be managed in this scenario.

philchalmers commented 5 months ago

"easily lead to running with the same random number seeds" is a bit of an overstatement, at least with respect to the likelihood of the seeds being the same across instances, but point taken. I've exposed the gen_seeds() function so that users can generate their own sets of unique seeds for this type of application, and you'll find this documented in the same vignette. Below is the idea

design <- SimDesign::createDesign(N=c(10,20,30), var=c(1,2,3,4,5))
seeds <- SimDesign::gen_seeds(design, nsets = 2L)
head(seeds)  # first column for computer 1, second for computer 2

# print whether identical number are generated
for(i in 1:nrow(design)){
    set.seed(seeds[i,1])
    v1 <- runif(100)
    set.seed(seeds[i,2])
    v2 <- runif(100)
    if(length(unique(c(v1, v2))) == 100L)
        print("Same numbers generated")    
}
set.seed(NULL)