mikemeredith / BEST

Bayesian Estimation Supersedes t-Test
32 stars 4 forks source link

BESTmcm rnd.seed sideeffects #7

Closed assignUser closed 7 years ago

assignUser commented 7 years ago

Using the rnd.seed option in the BESTmcmc function also sets the seed for the Global enviroment. Is this expected behavior? When chaining or looping BEST and other functions using the rnd e.g. sample() this causes side effects(namely all samples after n>2 being the same, i have attached some sample code that reproduces the issue) .

pop1<- rnorm(100000,0,1)
N <- 5
x[1] <- sample(pop1, N, replace = FALSE)
y[1] <- sample(pop1, N, replace = FALSE)
for( i in 2:10){
  x[i] <- sample(pop1, N, replace = FALSE)
  y[i] <- sample(pop1, N, replace = FALSE)
  BESTmcmc(x[i],y[i],rnd.seed = 1234,parallel = TRUE)
  print(all.equal(x[i],x[i-1]))
  print(all.equal(y[i],y[i-1]))
}

Thanks for your work on this Package!

mikemeredith commented 7 years ago

Thanks for pointing out that bug! Functions should always clean up and leave things as they found them. However, that's going to be tricky to implement within the framework of CRAN policies.

First off, your example code doesn't actually work. Here's code that does:

pop1<- rnorm(100000,0,1)
N <- 5
x <- y <- matrix(NA, N, 10)
x[, 1] <- sample(pop1, N, replace = FALSE)
y[, 1] <- sample(pop1, N, replace = FALSE)
for( i in 2:10){
  x[, i] <- sample(pop1, N, replace = FALSE)
  y[, i] <- sample(pop1, N, replace = FALSE)
  BESTmcmc(x[, i],y[, i], rnd.seed = 1234, parallel = TRUE)
  print(all.equal(x[, i],x[, i-1]))
  print(all.equal(y[, i],y[, i-1]))
}

Optimal fix

BESTmcmc should grab the current state of .Random.seed and restore it on exit:

better <- function(...) {
  oldseed <- .Random.seed ; on.exit(assign(".Random.seed", oldseed, pos=1) )
  BESTmcmc(...)
}

With this function, and a call to set.seed before the whole block, you will get the same set of samples each time:

set.seed(2017)
pop1<- rnorm(100000,0,1)
N <- 5
x <- y <- matrix(NA, N, 10)
x[, 1] <- sample(pop1, N, replace = FALSE)
y[, 1] <- sample(pop1, N, replace = FALSE)
for( i in 2:10){
  x[, i] <- sample(pop1, N, replace = FALSE)
  y[, i] <- sample(pop1, N, replace = FALSE)
  better(x[, i],y[, i], rnd.seed = 1234, parallel = TRUE)
  print(all.equal(x[, i],x[, i-1]))
  print(all.equal(y[, i],y[, i-1]))
}

The x and y matrices will be the same every time, and will still be the same if you comment-out the call to better.

The problem is that the call to assign(..., pos=1) contravenes CRAN policy:

  • Packages should not modify the global environment (user’s workspace).

The paradox is that every call to set.seed modifies .Random.seed, which is in the global environment, but no one get jailed for that. Maybe we can argue for an exception to the policy to enable it to be restored.

Acceptable fix

(Acceptable to CRAN that is.)

notSoGood <- function(...) {
  on.exit(set.seed(NULL))
  BESTmcmc(...)
}

Resetting the random number generator like this means that the x and y matrices will be different each time.

Workarounds

I'll try to post a patched version here shortly, but in the mean time:

  1. Restore .Random.seed after calling BESTmcmc in your loop:
    ...
    oldSeed <- .Random.seed
    BESTmcmc(x[, i],y[, i], rnd.seed = 1234, parallel = TRUE)
    .Random.seed <- oldSeed
    ...
  2. Reinitiallise the random number generator after the call to BESTmcmc:
    ...
    BESTmcmc(x[, i],y[, i], rnd.seed = 1234, parallel = TRUE)
    set.seed(NULL)
    ...
  3. Set a new seed before each sample draw, ie, at the start of the loop. I like this solution if the individual data sets are huge, because it means I can recreate any of the data sets which give interesting results, I don't have to save them all:
    for( i in 2:10){
    set.seed(i + 2017)
    x[, i] <- sample(pop1, N, replace = FALSE)
    y[, i] <- sample(pop1, N, replace = FALSE)
    BESTmcmc(x[, i],y[, i], rnd.seed = 1234, parallel = TRUE)
    print(all.equal(x[, i],x[, i-1]))
    print(all.equal(y[, i],y[, i-1]))
    }
  4. Generate all your samples before starting the analysis:
    set.seed(2017)
    pop1<- rnorm(100000,0,1)
    N <- 5
    x <- y <- matrix(NA, N, 10)
    for(i in 1:10) {
    x[, i] <- sample(pop1, N, replace = FALSE)
    y[, i] <- sample(pop1, N, replace = FALSE)
    }
    for( i in 2:10){
    BESTmcmc(x[, i],y[, i], rnd.seed = 1234, parallel = TRUE)
    print(all.equal(x[, i],x[, i-1]))
    print(all.equal(y[, i],y[, i-1]))
    }
mikemeredith commented 7 years ago

The issue is caused by a call to set.seed in jagsUI::jags.basic called from BESTmcmc. This is needed if random starting values are used, but BESTmcmc doesn't do that. I also noticed that the code there over-rides the random number generator type specified by BESTmcmc. I think the long-term solution is to use rjags directly.

mikemeredith commented 7 years ago

Just pushed a patch to Github with 'optimal fix', v.0.4.0.9002. You can install with githubinstall::githubinstall("BEST")

assignUser commented 7 years ago

What a quick response! Thank you very much. I have yet to update, I stopped using the seed and everything looked dandy. ~But just now I noticed that after 70 unique pair of samples I get the same 4 sequential pairs of samples over and over again. Not sure why, does not happen when I use the same loop with the BESTmcmc commented out. I will first try the updated version maybe that already fixes the error :)~ Can not replicate the issue. Can confirm that the optimal fix works great! Thanks again

mikemeredith commented 7 years ago

Resolved in CRAN version 0.5.0.