pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.68k stars 2.01k forks source link

Working with sampled pymc3 parameter values #601

Closed thouska closed 10 years ago

thouska commented 10 years ago

I've started to work with pymc2.3 to calibrate some complex extern models (with the @pm.deterministic decorator) and I can say that pymc is awesome. But as i've really a lot of parameters to calibrate i wanted to test the new NUTS sampler of pymc3. Here I'm struggeling how to work with the sampled parameter values. I think the sampler of pymc2 creates floats, which is exactly what i need for my complex model. pymc3 seems to create some 'pymc3.model.FreeRV' theano.tensor things during the sampling and I do not know how to work with them. Is it somehow possible to transform the sampled values into floats?

Some example code, where i want to have float paramters x and y to feed my complex model with:

import pymc3 as pm
import numpy as np
#import complexmodel

true_data=[1,2,3]

def complexmodel(x,y):
    print type(x)
    float_x=float(x)
    float_y=float(y)
    #simulations=complexmodel(float_x,float_y)
    return simulations

with pm.Model() as model:
    x = pm.Normal('x', mu=20., sd=10)
    y = pm.Normal('y', mu=1., sd=1)   
    z = pm.Normal('z', mu=complexmodel(x,y), sd=.075, observed=true_data)

with model:
    start = pm.find_MAP()
    step = pm.NUTS()

with model:
    trace = pm.sample(3000, step, start)

pm.traceplot(trace)

This gives me the following error:

>>> runfile(.../PYMC3settings.py', wdir=r'...')
<class 'pymc3.model.FreeRV'>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\...\sitecustomize.py", line 540, in runfile
    execfile(filename, namespace)
  File ".../PYMC3settings.py", line 25, in <module>
    z = pm.Normal('z', mu=complexmodel(x,y), sd=.075, observed=true_data)
  File ".../PYMC3settings.py", line 17, in complexmodel
    float_x=float(x)
TypeError: float() argument must be a string or a number

Thanks for some hints!

PS.: I would also be very happy about an implementation of the DREAM sampler in pymc3 (#585). I could not bring the multichain_mcmc package to work with pymc2.3.

fonnesbeck commented 10 years ago

It looks like you are trying to cast a PyMC object to a float, which will not work. Are you trying to come up with some sort of mixture of the two stochastics? If so, you should be looking at using a Deterministic object instead.

thouska commented 10 years ago

Thank you for your fast answer! I want to run my model (a geoscience model) with the stochastics x and y. Those stochastics tell the model how to be set the parameters in the model. My model takes just float values, so I have to get get sampled floats for x and y. I guess this is not a sort of mixture oft the stochastics but the Deterministic object seems to be a good idea. In pymc2.3 i'm doing something like this to analyse my model:

import pymc2 as pm
import numpy as np

def complexmodel(x,y):
    # just an stupid example, but i actually have to check the sampled variables first, 
    # to prevent that the sampled stochastics run out of the physical boundaries of my complex model 
    # (this would result in an error, stopping the whole run)

    print float(x)    #Here it works to get float values
    if x<0:
        return [-np.inf]*3
    if y<0:
        return [-np.inf]*3
    else:
        return [x,y,3]

class model(object):
    true_data=[1,2,3]

    x = pm.Normal("x", mu=20, tau=10**-2)
    y = pm.Normal("y", mu=1, tau=1**-2)

    @pm.deterministic
    def testmodel(x=x,y=y):   
        return complexmodel(x,y)

    like = pm.Normal("like", mu=testmodel, tau=0.075**-2, value=true_data, observed=True)
    modelvars=dict(x=x,y=y,testmodel=testmodel,like=like)

M = pm.MCMC(model)
M.sample(iter=1000,burn=500)
plot(M.trace('x')[:])
plot(M.trace('y')[:])

This code works fine. And i can get float values from x and y. How do I translate this code into pymc3? If I use your proposed deterministic object in pymc3, would it give me x and y in a way that they can be transformed into floats? The the deterministic class in pymc 3 is not documented yet, so I'm not sure how to use it correct. I tried something out:

import pymc3 as pm
import numpy as np

true_data=[1,2,3]

def complexmodel(x,y):
    print float(x)
    if x<0:
        return [-np.inf,y,3]
    if y<0:
        return [x,-np.inf,3]
    else:    
        return [x,y,3]

with pm.Model() as model:
    x = pm.Normal('x', mu=20., sd=10)
    y = pm.Normal('y', mu=1., sd=1)
    testmodel = pm.Deterministic('testmodel',[x,y], model=complexmodel)
    z = pm.Normal('z', mu=testmodel, sd=0.075, observed=true_data)

with model:
    start = pm.find_MAP()
    step = pm.NUTS()

with model:
    trace = pm.sample(3000, step, start)

trace[y].shape
pm.traceplot(trace)

But this gives me an error:

testmodel = pm.Deterministic('testmodel',[x,y], model=complexmodel)
  File "...\pymc3\model.py", line 363, in Deterministic
    var.name = name
AttributeError: 'list' object has no attribute 'name'

So my main point is, how to translate the pymc2.3 example into a working pymc3 style where I can analyse my complex model with floats of the stochastics x and y and many more stochastics?

thouska commented 10 years ago

testmodel = pm.Deterministic('testmodel',complexmodel(x,y), model=None) seems to work, but I still do not know how to access the values of x and y during the sample to make a float out of them. Something like x.tag.test_value which returns in my case 20.0 would help. Reading a bit about theano, I found that x.eval() should give me the sampled value of x, but instead I get another error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
    execfile(filename, namespace)
  File ".../PYMC3settings.py", line 36, in <module>
    testmodel = pm.Deterministic('testmodel',complexmodel(x,y), model=None)
  File ".../PYMC3settings.py", line 25, in complexmodel
    print x.eval()
  File "...\lib\site-packages\theano-0.6.0-py2.7.egg\theano\gof\graph.py", line 420, in eval
    self._fn = theano.function(self._fn_inputs, self)
  File "...\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function.py", line 223, in function
    profile=profile)
  File "...\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\pfunc.py", line 512, in pfunc
    on_unused_input=on_unused_input)
  File "...\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py", line 1311, in orig_function
    on_unused_input=on_unused_input).create(
  File "...\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py", line 1007, in __init__
    fgraph, additional_outputs = std_fgraph(inputs, outputs, accept_inplace)
  File "...\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py", line 132, in std_fgraph
    fgraph = gof.fg.FunctionGraph(orig_inputs, orig_outputs)
  File "...\lib\site-packages\theano-0.6.0-py2.7.egg\theano\gof\fg.py", line 128, in __init__
    self.__import_r__(outputs, reason="init")
  File "...lib\site-packages\theano-0.6.0-py2.7.egg\theano\gof\fg.py", line 256, in __import_r__
    raise MissingInputError("Undeclared input", r)
theano.gof.fg.MissingInputError: ('Undeclared input', x)

Isn´t there any possibility to access the values of the sampled stochastics?

fonnesbeck commented 10 years ago

You aren't using Deterministic correctly. This is just a wrapper for a simple deterministic transformation of PyMC variables, such as:

x = Normal('x', 0, 1)
x2 = Deterministic('x2', x**2)

The model argument should not be used when inside of a model context. Your complexmodel function should return a PyMC (Theano) variable,

Then, it the return value should be passed to the deterministic, not the function itself:

testmodel = pm.Deterministic('testmodel', complexmodel(x,y))

Not surprised that you are confused, as we have not written up the documentation yet (but we are working towards it).

thouska commented 10 years ago

Thanks to @fonnesbeck! Now i´ve understand what this pm.deterministic does. Unfortunatelly it´s not what i searched for. My complexmodel does return a list of simlated values and not a theano variable.

What i search for is an algorithm which fits my external complexmodel with some parameters (here they are called stochastics) to observed data. PyMC2.3 does this job very good and I have the feeling that PyMC3 could do it even better. But I don´t know how to do it.

To show what i exactly want, I made my example from above more explicite:

values=[]
def complexmodel(x,y):
    #Call an external model here, e.g. (https://github.com/sahg/https://github.com/sahg/PyTOPKAPI)
    #The PyTOPKAPI model can not work with Theano variables and needs x and y floats, so   
    #something has to happen here, to transform the theano variables x and y to floats.
    Discharge = PyTOPKAPI.run('model-simulation.ini',x,y) 
    values.append(x,y) # Just to see what is going on
    return Discharge

with pm.Model() as model:
    x = pm.Normal('x', mu=20, sd=10)
    y = pm.Normal('y', mu=1, sd=1)
    z = pm.Normal('z', mu=complexmodel(x,y), sd=0.075, observed=true_data)

I would expect that my complexmodel function gets called in every step of the sampling (s.o.). Here i´m not sure if this is really happening. To check this, i store the stochastics x and y in a list values in the complexmodel function. But after the sampling, i found just one x and one y in the values list, instead of 3000. To calculate the likelihood z the function needs to be called 3000 times, doesn´t it? Why contains my list valuesonly one x and one y instance?

fonnesbeck commented 10 years ago

Check out the approach used in #507, which allows you to make arbitrary deterministics compatible with PyMC 3's Theano internals. In particular, see the @theano.compile.ops.as_op decorator.

thouska commented 10 years ago

Thanks for this great hint! My code is working now:

import pymc3 as pm
import numpy as np
import theano
import theano.tensor as t

true_data=[1,2]

values=[]

@theano.compile.ops.as_op(itypes=[t.dscalar, t.dscalar],otypes=[t.dvector])
def complexmodel(x,y):
    print float(x)
    return np.array([x,y])

with pm.Model() as model:
    x = pm.Exponential('x', lam=1)
    y = pm.Exponential('y', lam=1)
    complexmodel=complexmodel(x,y)
    z = pm.Poisson('z', complexmodel, observed=true_data)

with model:
    #start = pm.find_MAP()
    start ={'x':1,'y':2.5}
    #step =pm.NUTS()    
    step = pm.Metropolis()

with model:
    trace = pm.sample(3000, step, start)

trace[y].shape
pm.traceplot(trace)

Do you plan, to bring gradient based samplers to work with this, or maybe a more straightforward, trick? When I use for example your NUTS sampler as a step method I get an error:

> 0.693147182465
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
    execfile(filename, namespace)
  File ".../PYMC3settings.py", line 137, in <module>
    step =pm.NUTS()    
  File "...\site-packages\pymc3\step_methods\nuts.py", line 59, in __init__
    scaling = guess_scaling(Point(scaling, model=model), model=model, vars = vars)
  File "...\site-packages\pymc3\tuning\scaling.py", line 79, in guess_scaling
    h = find_hessian_diag(point, vars, model=model)
  File "...\site-packages\pymc3\tuning\scaling.py", line 74, in find_hessian_diag
    H = model.fastfn(hessian_diag(model.logpt, vars))
  File "...\site-packages\pymc3\memoize.py", line 14, in memoizer
    cache[key] = obj(*args, **kwargs)
  File "...\site-packages\pymc3\theanof.py", line 94, in hessian_diag
    return -t.concatenate([hessian_diag1(f, v) for v in vars], axis=0)
  File "...\site-packages\pymc3\theanof.py", line 80, in hessian_diag1
    g = gradient1(f, v)
  File "...\site-packages\pymc3\theanof.py", line 43, in gradient1
    return t.flatten(t.grad(f, v, disconnected_inputs='warn'))
  File "...\site-packages\theano-0.6.0-py2.7.egg\theano\gradient.py", line 529, in grad
    grad_dict, wrt, cost_name)
  File "...\site-packages\theano-0.6.0-py2.7.egg\theano\gradient.py", line 1213, in _populate_grad_dict
    rval = [access_grad_cache(elem) for elem in wrt]
  File "...\site-packages\theano-0.6.0-py2.7.egg\theano\gradient.py", line 1173, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "...\site-packages\theano-0.6.0-py2.7.egg\theano\gradient.py", line 1034, in access_term_cache
    input_grads = node.op.grad(inputs, new_output_grads)
AttributeError: 'FromFunctionOp' object has no attribute 'grad'
twiecki commented 10 years ago

The only way to get the automatic gradient computation is by expressing your density in terms of theano operators. as_op creates a blackbox function for which autodiff will not work so there is no way I know of (except numerical differentiation) to make this work.