setting random seed does not result in reproducibility

jeff788 commented 8 years ago

Below I have a simple linear regression example that does not result in a reproducible trace. I have ran the code repeatedly in both a linux virtual machine and on windows. I am using PyMC version 2.3.6, python version 3.5.1, and numpy 1.10.4. On my linux virtual machine I get the same result most of the time, but about 1:10 times I get a different result. The same thing occurs in windows, but the result osculated between 4-5 different values with less repeatability than in the virtual machine. I verified that in each case numpy was generating the same synthetic data and the initial values for the stochastics are all the same in every case. I also tried both the Metropolis and AdaptiveMetropolis samplers and the result was the same.

import pymc
import numpy as np

np.random.seed(0)
N=100
x_data = np.linspace(0,1,N)
noise = np.random.normal(loc=0.0,scale=1.0,size=N)
breal  = 2.3
mreal = 7.2
y_data = mreal*x_data+breal+noise
pymc.numpy.random.seed(0)
def MCMC_linregress(x_data,y_data,mmin=-20.0,mmax=20.0,bmin=-20.0,bmax=20.0):
    pymc.numpy.random.seed(0)
    m = pymc.Uniform("m",lower=mmin,upper=mmax)#,value=0.5*(mmin+mmax))
    print('initial m= ',m.value)
    pymc.numpy.random.seed(0)
    b = pymc.Uniform("b",lower=bmin,upper=bmax)#,value=0.5*(bmin+bmax))
    print('initial b= ',b.value)
    x = pymc.Uniform("x",np.min(x_data),np.max(x_data),value=x_data,observed=True)
    pymc.numpy.random.seed(0)
    err = pymc.Uniform("err",lower=-10,upper=10)#,value=1.0)
    print('initial err= ',err.value)
    @pymc.deterministic
    def y_pred(m=m,b=b,x=x):
        return m*x+b
    y = pymc.Normal("y",mu=y_pred,tau=1.0/(err**2),value=y_data,observed=True)
    pymc.numpy.random.seed(0)
    model = pymc.MCMC(pymc.Model([y_pred,m,b,x,y,err]))
    #model.use_step_method(pymc.AdaptiveMetropolis,[m,b,err])
    model.use_step_method(pymc.Metropolis,m)
    model.use_step_method(pymc.Metropolis,b)
    model.use_step_method(pymc.Metropolis,err)
    model.sample(iter=100,burn=0)
    return model.trace('m')[:],model.trace('b')[:],model.trace('err')[:]
m,b,err=MCMC_linregress(x_data,y_data,mmin=-20,mmax=20,bmin=-20.0,bmax=20.0)
print('m[-1]= ',m[-1])

emad2 commented 8 years ago

I have the same issue. Does anybody know how to reproduce the pymc samples?

fonnesbeck commented 8 years ago

PyMC Stochastic objects have rseed arguments that should do the trick. For example,

err = pymc.Uniform("err", lower=-10, upper=10, rseed=42)

kou1okada commented 4 years ago

Is there any way to get same results using a random seed? I can't believe that the above trick is working.

I saved the script in https://github.com/pymc-devs/pymc/issues/129#issue-167741999 to mcmc_orig.py. And I modified it according to the advice of https://github.com/pymc-devs/pymc/issues/129#issuecomment-244791294. Now I have three scripts of mcmc_orig.py, mcmc_trick.py and mcmc_trick_all.py. Below is results of diff.

$ diff mcmc_orig.py mcmc_trick.py
21c21
<     err = pymc.Uniform("err",lower=-10,upper=10)#,value=1.0)
---
>     err = pymc.Uniform("err",lower=-10,upper=10,rseed=42)#,value=1.0)

$ diff mcmc_orig.py mcmc_trick_all.py
14c14
<     m = pymc.Uniform("m",lower=mmin,upper=mmax)#,value=0.5*(mmin+mmax))
---
>     m = pymc.Uniform("m",lower=mmin,upper=mmax,rseed=42)#,value=0.5*(mmin+mmax))
17c17
<     b = pymc.Uniform("b",lower=bmin,upper=bmax)#,value=0.5*(bmin+bmax))
---
>     b = pymc.Uniform("b",lower=bmin,upper=bmax,rseed=42)#,value=0.5*(bmin+bmax))
21c21
<     err = pymc.Uniform("err",lower=-10,upper=10)#,value=1.0)
---
>     err = pymc.Uniform("err",lower=-10,upper=10,rseed=42)#,value=1.0)
26c26
<     y = pymc.Normal("y",mu=y_pred,tau=1.0/(err**2),value=y_data,observed=True)
---
>     y = pymc.Normal("y",mu=y_pred,tau=1.0/(err**2),value=y_data,observed=True,rseed=42)

And I got results as following:

$ for i in `seq 10`; do python3 mcmc_orig.py; done
initial m=  1.9525401570929901
initial b=  1.9525401570929901
initial err=  0.9762700785464951
[-----------------100%-----------------] 100 of 100 complete in 0.0 sec
m[-1]=  7.127655974085887
...

In these results, initial m, b and err were stable. However, m[-1] was unstable in any of the three scripts as following:

$ for suffix in orig trick trick_all; do echo $suffix; for i in `seq 10`; do python3 mcmc_$suffix.py; done|grep "^m"; done
orig
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  6.492402422449748
m[-1]=  7.127655974085887
m[-1]=  6.492402422449748
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
trick
m[-1]=  7.127655974085887
m[-1]=  7.482716594849148
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
m[-1]=  7.127655974085887
trick_all
m[-1]=  7.482716594849148
m[-1]=  7.127655974085887
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148
m[-1]=  7.127655974085887
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148

Also, I tried to use Model.seed() method as following:

$ diff -c mcmc_trick_all.py mcmc_trick_all+seed.py
*** mcmc_trick_all.py   2020-02-05 18:48:56.160439700 +0900
--- mcmc_trick_all+seed.py      2020-02-05 19:16:47.338333600 +0900
***************
*** 30,35 ****
--- 30,36 ----
      model.use_step_method(pymc.Metropolis,m)
      model.use_step_method(pymc.Metropolis,b)
      model.use_step_method(pymc.Metropolis,err)
+     model.seed()
      model.sample(iter=100,burn=0)
      return model.trace('m')[:],model.trace('b')[:],model.trace('err')[:]
  m,b,err=MCMC_linregress(x_data,y_data,mmin=-20,mmax=20,bmin=-20.0,bmax=20.0)

But it had not effect.

$ for i in `seq 10`; do python3 mcmc_trick_all+seed.py; done|grep "^m"
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148
m[-1]=  7.127655974085887
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148
m[-1]=  7.482716594849148
m[-1]=  7.127655974085887
m[-1]=  6.911749182676079
m[-1]=  7.482716594849148

I have confirmed these behaviors at Ubuntu, Cygwin and Anaconda (on win10). These are not same values, but they seem not random. These behaviors are similar to a bug in C with uninitialized variables. Can anyone explain this reason?

fonnesbeck commented 4 years ago

Please use PyMC3 (https://github.com/pymc-devs/pymc3) unless you have a good reason not to. You should have no problem reproducing samples there.

kou1okada commented 4 years ago

Thank you for an advice. I see, I'll try it.

pymc-devs / pymc2

setting random seed does not result in reproducibility #129