pveber / morse

Companion R package for MOSAIC website
7 stars 5 forks source link

survFit is ignoring the random state #268

Closed konkam closed 6 years ago

konkam commented 6 years ago

It does not seem possible to obtain reproducible results by setting the seed in advance:

library(morse)
data("propiconazole")
dat = survData(propiconazole)
set.seed(0)
fit_cstSD <- survFit(dat, quiet = TRUE, model_type = "SD")
summary(fit_cstSD)
## Summary: 
## 
## Priors of the parameters (quantiles) (select with '$Qpriors'):
## 
##  parameters    median      Q2.5     Q97.5
##          kd 4.157e-02 2.771e-04 6.236e+00
##          hb 1.317e-02 2.708e-04 6.403e-01
##           z 1.700e+01 8.171e+00 3.539e+01
##          kk 5.727e-03 1.021e-05 3.212e+00
## 
## Posteriors of the parameters (quantiles) (select with '$Qposteriors'):
## 
##  parameters    median      Q2.5     Q97.5
##          kd 2.193e+00 1.591e+00 3.445e+00
##          hb 2.670e-02 1.285e-02 4.893e-02
##           z 1.690e+01 1.548e+01 1.863e+01
##          kk 1.231e-01 7.841e-02 1.885e-01
set.seed(0)
fit_cstSD <- survFit(dat, quiet = TRUE, model_type = "SD")
summary(fit_cstSD)
## Summary: 
## 
## Priors of the parameters (quantiles) (select with '$Qpriors'):
## 
##  parameters    median      Q2.5     Q97.5
##          kd 4.157e-02 2.771e-04 6.236e+00
##          hb 1.317e-02 2.708e-04 6.403e-01
##           z 1.700e+01 8.171e+00 3.539e+01
##          kk 5.727e-03 1.021e-05 3.212e+00
## 
## Posteriors of the parameters (quantiles) (select with '$Qposteriors'):
## 
##  parameters    median      Q2.5     Q97.5
##          kd 2.194e+00 1.591e+00 3.508e+00
##          hb 2.663e-02 1.276e-02 4.891e-02
##           z 1.690e+01 1.546e+01 1.864e+01
##          kk 1.233e-01 7.799e-02 1.864e-01

If you look at the quantiles of the posteriors, you can see that they do not coincide in spite of starting from the same random state.

At the moment I have not been able to find an explanation for this behaviour.

Results obtained on the CRAN version of morse Version: 3.0.0 with R version 3.4.3 (2017-11-30)

pveber commented 6 years ago

I wonder if that's even possible, since the sampling is left to jags. We'd have to check whether jags is passed a random state on the command line. Any idea @virgile-baudrot ?

konkam commented 6 years ago

Indeed you're right, set.seed() does not even produce reproducible results with jags for an arbitrary model.

Sorry this is the first thing I should have tried.

So there is no relation to morse here.

konkam commented 6 years ago

Just in case you are ever interested in implementing the behaviour I was trying to obtain, ie. getting reproducible results by using a command such as set.seed(), here is a solution :

Set up of an example normal model:

N <- 1000
x <- rnorm(N, 0, 5)

m_code = "model {
  for (i in 1:N) {
    x[i] ~ dnorm(mu, tau)
  }
  mu ~ dnorm(0, .0001)
  tau <- pow(sigma, -2)
  sigma ~ dunif(0, 100)
}"

library('rjags')
## Loading required package: coda

## Linked to JAGS 4.2.0

## Loaded modules: basemod,bugs

Basic setting of set.seed(0) does not work

set.seed(0)

jags <- jags.model(textConnection(m_code),
                   data = list('x' = x,
                               'N' = N),
                   n.chains = 4,
                   n.adapt = 100)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1000
##    Unobserved stochastic nodes: 2
##    Total graph size: 1010
## 
## Initializing model
update(jags, 1000)

res = coda.samples(jags,
             c('mu', 'tau'),
             1000)

summary(res)
## 
## Iterations = 1101:2100
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean       SD  Naive SE Time-series SE
## mu  -0.07976 0.159429 2.521e-03      2.453e-03
## tau  0.03940 0.001766 2.792e-05      3.301e-05
## 
## 2. Quantiles for each variable:
## 
##         2.5%      25%      50%     75%   97.5%
## mu  -0.38881 -0.18923 -0.07980 0.02996 0.22174
## tau  0.03596  0.03818  0.03938 0.04064 0.04287
set.seed(0)

jags <- jags.model(textConnection(m_code),
                   data = list('x' = x,
                               'N' = N),
                   n.chains = 4,
                   n.adapt = 100)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1000
##    Unobserved stochastic nodes: 2
##    Total graph size: 1010
## 
## Initializing model
update(jags, 1000)

res = coda.samples(jags,
                   c('mu', 'tau'),
                   1000)

summary(res)
## 
## Iterations = 1101:2100
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean       SD  Naive SE Time-series SE
## mu  -0.08041 0.161120 2.548e-03      0.0025841
## tau  0.03938 0.001773 2.804e-05      0.0000319
## 
## 2. Quantiles for each variable:
## 
##         2.5%      25%      50%     75%   97.5%
## mu  -0.40332 -0.18723 -0.08021 0.02578 0.24127
## tau  0.03598  0.03819  0.03938 0.04052 0.04288

But giving an inits argument which depends on the current random state works:

set.seed(0)

jags <- jags.model(textConnection(m_code),
                   data = list('x' = x,
                               'N' = N),
                   n.chains = 4,
                   n.adapt = 100, inits = list(.RNG.name="base::Super-Duper", .RNG.seed=sample.int(1000, 1)))
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1000
##    Unobserved stochastic nodes: 2
##    Total graph size: 1010
## 
## Initializing model
update(jags, 1000)

res = coda.samples(jags,
                   c('mu', 'tau'),
                   1000)

summary(res)
## 
## Iterations = 1101:2100
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean       SD  Naive SE Time-series SE
## mu  -0.08972 0.157833 2.496e-03      2.372e-03
## tau  0.03935 0.001808 2.858e-05      3.132e-05
## 
## 2. Quantiles for each variable:
## 
##         2.5%      25%      50%     75%   97.5%
## mu  -0.40112 -0.19520 -0.08924 0.01676 0.22294
## tau  0.03596  0.03818  0.03933 0.04050 0.04304
set.seed(0)

jags <- jags.model(textConnection(m_code),
                   data = list('x' = x,
                               'N' = N),
                   n.chains = 4,
                   n.adapt = 100, inits = list(.RNG.name="base::Super-Duper", .RNG.seed=sample.int(1000, 1)))
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1000
##    Unobserved stochastic nodes: 2
##    Total graph size: 1010
## 
## Initializing model
update(jags, 1000)

res = coda.samples(jags,
                   c('mu', 'tau'),
                   1000)

summary(res)
## 
## Iterations = 1101:2100
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean       SD  Naive SE Time-series SE
## mu  -0.08972 0.157833 2.496e-03      2.372e-03
## tau  0.03935 0.001808 2.858e-05      3.132e-05
## 
## 2. Quantiles for each variable:
## 
##         2.5%      25%      50%     75%   97.5%
## mu  -0.40112 -0.19520 -0.08924 0.01676 0.22294
## tau  0.03596  0.03818  0.03933 0.04050 0.04304

Note that I had to select one specific random number generator, which I did not investigate.

virgile-baudrot commented 6 years ago

Fantastic. Should we add an argument seed for this option?

For instance:

fit_cstSD <- survFit(dat, quiet = TRUE, model_type = "SD", seed = 123)