pymc-devs / pymc2

THIS IS THE **OLD** PYMC PROJECT (VERSION 2). PLEASE USE PYMC INSTEAD:
http://pymc-devs.github.com/pymc/
Other
879 stars 228 forks source link

ipyparallel string/unicode problems #104

Closed noahaskell closed 8 years ago

noahaskell commented 8 years ago

I'm trying to fit pymc2 models in parallel using ipyparallel on a Mac Pro, and I'm getting weird errors related to strings. Here's my system information:

python -c "import IPython; print(IPython.sys_info())"
{'commit_hash': u'b573435',
 'commit_source': 'installation',
 'default_encoding': 'UTF-8',
 'ipython_path': '/Users/noahsilbert/anaconda/envs/py2/lib/python2.7/site-packages/IPython',
 'ipython_version': '4.1.2',
 'os_name': 'posix',
 'platform': 'Darwin-15.4.0-x86_64-i386-64bit',
 'sys_executable': '/Users/noahsilbert/anaconda/envs/py2/bin/python',
 'sys_platform': 'darwin',
 'sys_version': '2.7.11 |Anaconda 4.0.0 (x86_64)| (default, Dec  6 2015, 18:57:58) \n[GCC 4.2.1 (Apple Inc. build 5577)]'}

I'm using pymc version 2.3.6. Here's an example script that produces the error(s) in question:

import pymc as pm
import numpy as np
import time

N = 200
x = np.random.normal(loc=5,scale=2,size=N)
X = np.ones((N,2))
X[:,1] = x

beta = np.array([10,-3])

e = np.random.normal(scale=5,size=N) 

y = np.dot(X,beta) + e

def regression(yy,XX):

    a = pm.Normal('a',mu=0,tau=1,size=1)
    b = pm.Normal('b',mu=0,tau=1,size=1)
    t = pm.InverseGamma('t',alpha=1,beta=1,size=1)

    @pm.deterministic()
    def y_hat(Xm=XX,A=a,B=b):
        y_hat = A*XX[:,0] + B*XX[:,1]
        return y_hat

    Y = pm.Normal('Y', mu=y_hat, tau=t, observed=True, value=yy)

nchain = 1
nkeep = 200
nburn = 500
nthin = 10
niter = nburn + nkeep*nthin

clock_string = str(int(time.time()*1000000))[-6:]
date_string = time.ctime().split()[1] + '_' + time.ctime().split()[2]
time_string = date_string + '_' + clock_string

db_name = 'regression_MC_' + time_string + '.hdf5'
MC = pm.MCMC(input=regression(yy=y,XX=X),db='hdf5',dbname=db_name,dbmode='w',verbose=1)
MC.use_step_method(pm.AdaptiveMetropolis,MC.stochastics)
MC.sample(iter=niter,burn=nburn,thin=nthin)
MC.db.close()

I have that saved as test_script.py, and then I try to run the following script, either from the command prompt or in IPython:

import ipyparallel as ipp
c = ipp.Client()
d = c[:]
d.block = True
model_script = open('test_script.py').read()
d.execute(model_script)

At which point I get the following Traceback:

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-1-cd0f32256b19> in <module>()
     38
     39 db_name = 'regression_MC_' + time_string + '.hdf5'
---> 40 MC = pm.MCMC(input=regression(yy=y,XX=X),db='hdf5',dbname=db_name,dbmode='w',verbose=1)
     41 MC.use_step_method(pm.AdaptiveMetropolis,MC.stochastics)
     42 MC.sample(iter=niter,burn=nburn,thin=nthin)
<ipython-input-1-cd0f32256b19> in regression(yy, XX)
     16 def regression(yy,XX):
     17
---> 18     a = pm.Normal('a',mu=0,tau=1,size=1)
     19     b = pm.Normal('b',mu=0,tau=1,size=1)
     20     t = pm.InverseGamma('t',alpha=1,beta=1,size=1)
/Users/noahsilbert/anaconda/envs/py2/lib/python2.7/site-packages/pymc/distributions.pyc in __init__(self, *args, **kwds)
    318                     logp_partial_gradients=logp_partial_gradients,
    319                     dtype=dtype,
--> 320                     **arg_dict_out)
    321
    322     new_class.__name__ = name
/Users/noahsilbert/anaconda/envs/py2/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in __init__(self, logp, doc, name, parents, random, trace, value, dtype, rseed, observed, cache_depth, plot, verbose, isdata, check_logp, logp_partial_gradients)
    762                           dtype=dtype,
    763                           plot=plot,
--> 764                           verbose=verbose)
    765
    766         # self._logp.force_compute()
/Users/noahsilbert/anaconda/envs/py2/lib/python2.7/site-packages/pymc/Node.pyc in __init__(self, doc, name, parents, cache_depth, trace, dtype, plot, verbose)
    212         self.extended_children = set()
    213
--> 214         Node.__init__(self, doc, name, parents, cache_depth, verbose=verbose)
    215
    216         if self.dtype is None:
/Users/noahsilbert/anaconda/envs/py2/lib/python2.7/site-packages/pymc/Node.pyc in __init__(self, doc, name, parents, cache_depth, verbose)
    117             raise ValueError(
    118                 'The name argument must be a string, but received %s.' %
--> 119                 name)
    120         self.__name__ = name
    121
ValueError: The name argument must be a string, but received a.

If I replace the string in line 18 with str('a') (and similarly for the other named variables in the model), I then get this error:

---------------------------------------------------------------------------AttributeError                            Traceback (most recent call last)<ipython-input-1-8041b7b164c1> in <module>()
     38
     39 db_name = 'regression_MC_' + time_string + '.hdf5'
---> 40 MC = pm.MCMC(input=regression(yy=y,XX=X),db='hdf5',dbname=db_name,dbmode='w',verbose=1)
     41 MC.use_step_method(pm.AdaptiveMetropolis,MC.stochastics)
     42 MC.sample(iter=niter,burn=nburn,thin=nthin)
/Users/noahsilbert/anaconda/envs/py2/lib/python2.7/site-packages/pymc/MCMC.pyc in __init__(self, input, db, name, calc_deviance, **kwds)
     80             name,
     81             calc_deviance=calc_deviance,
---> 82             **kwds)
     83
     84         self._sm_assigned = False
/Users/noahsilbert/anaconda/envs/py2/lib/python2.7/site-packages/pymc/Model.pyc in __init__(self, input, db, name, reinit_model, calc_deviance, verbose, **kwds)
    205         # Specify database backend and save its keywords
    206         self._db_args = kwds
--> 207         self._assign_database_backend(db)
    208
    209         # Flag for model state
/Users/noahsilbert/anaconda/envs/py2/lib/python2.7/site-packages/pymc/Model.pyc in _assign_database_backend(self, db)
    583             self.restore_sampler_state()
    584         else:   # What is this for? DH.
--> 585             self.db = db.Database(**self._db_args)
    586
    587     def pause(self):
AttributeError: 'unicode' object has no attribute 'Database'

If I change line 39 to db_name = str('regression_MC_' + time_string + '.hdf5'), or if I try to use str() with the individual parts of time_string, I get the same error.

This only happens on my Mac Pro. I can get pymc2 to cooperate with ipyparallel on an iMac, for example. But given the larger number of processors in the Mac Pro, I would dearly love to get this working on that computer. Thanks.

fonnesbeck commented 8 years ago

Can you check to see that the quotes around 'a' are actual ascii quotes and not unicode quotes? This can sometimes happen if you've cut an pasted code from a pdf or a website, for example. Alternately you could try install in Python 3, which I recommend anyway. Easy to do via Anaconda.

noahaskell commented 8 years ago

I typed it all out in vim, so I'm not sure how the quotes could be unicode and not ascii. But how can I check to make sure?

Does pymc2 work with Python 3? The model I'm actually trying to fit uses an imported multivariate normal cdf function in one of the deterministic nodes. I have a version of it implemented in pymc3 (using Python 3), but I have to use metropolis sampling because of the cdf function (which is used in an as_op function). The added compilation time for pymc3 doesn't seem to be worth it, and pymc2 seems to be working fairly well for this model. I just want to fit it in parallel.

I can invoke multiple instances of IPython and run the script from each of those, and it works fine, even on my Mac Pro, but I don't know if that's actually running the chains in parallel in a useful way.

fonnesbeck commented 8 years ago

The version numbers of PyMC and Python are not related to one another. I haven't used Python 2.7 for a couple years now.

noahaskell commented 8 years ago

Okay, good to know. I thought I had run into problems trying to use PyMC 2 with Python 3, but maybe I'm misremembering something. I just recently upgraded to Python 3.

noahaskell commented 8 years ago

Everything works fine in Python 3. For what it's worth, installing PyMC2 broke my PyMC3 installation, but I uninstalled PyMC3 and Theano then reinstalled PyMC3 (which also reinstalled Theano, of course), and both PyMC2 and PyMC3 seem to work fine.