pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.5k stars 1.97k forks source link

Leveraging SymPy for PyMC3 #178

Closed twiecki closed 11 years ago

twiecki commented 11 years ago

SymPy (http://sympy.org/en/index.html) is a Python library for symbolic mathematics.

My initial motivation for looking at SymPy resulted from #172 and #173. Instead of recoding all probability distributions, samplers etc in Theano, maybe we could just use the ones provided by sympy.stats (http://docs.sympy.org/dev/modules/stats.html).

For this to work we needed to convert the sympy computing graph to a theano one. It seems that there is some work that shows that this is possible (https://github.com/nouiz/theano_sympy)

Looking at sympy (and sympy.stats) more closely it seems that there are potentially more areas where integrating this could help. Maybe this would give the best of both worlds: "Theano focuses more on tensor expressions than Sympy, and has more machinery for compilation. Sympy has more sophisticated algebra rules and can handle a wider variety of mathematical operations (such as series, limits, and integrals)."

There is additional discussion here: https://github.com/nouiz/theano_sympy/issues/1.

Copy pasting some chunks from @mrocklin response to move the discussion over here:

Overlap

There are some obvious points of overlap between the various projects

What is the relationship with statsmodels? They also have a home-grown internal algebraic system. My guess is that if everyone were to unite under one algebraic system there would be some pleasant efficiencies. I obviously have a bias about what that algebraic system should be :)

Derivatives

Both Theano and SymPy provide derivatives which, apparently, you need. SymPy provides analytic ones, Theano provides automatic ones. My suggestion would be to use SymPy if it works and fall back on Theano if it doesn't work. You don't need SymPy.stats for this (in case you didn't want to offload your distributions work.) SymPy.core would be just fine.

Other benefits

In general the benefits to using symbolic systems tend to be unexpected. SymPy can provide lots of general aesthetic fluff like awesome pretty printing, symbolic simplification, C/Fortran code snippet generation, etc....

jsalvatier commented 11 years ago

Very interesting. Theano_Sympy doesn't provide examples that make it clear how one could take a calculation in sympy and then convert it to a theano calculation, so it's a little hard to evaluate.

I'm not sure what he means when he says 'you have created some sort of symbolic algebra class structure (you add two pymc.Normal objects)'. Is that just

x = Normal('x', 0,1) y = Normal('y', 0, 1) z = x + y

?

mrocklin commented 11 years ago

Yes, that is what I mean.

jsalvatier commented 11 years ago

Okay that makes sense. Right now, in pymc3, theano takes care of that stuff.

The pretty printing etc does sound pretty nice.

If I wanted to calculate the probability density of a Normal distribution given some value for the mean, variance and data value using the SymPy distribution but do that calculation in Theano, how would I do that?

mrocklin commented 11 years ago

It looks like theano_sympy was designed to take Theano expressions to SymPy, simplify them, and then send them back again. This is not your application

The function that seems most useful in your case is the aptly named sympy_to_theano. It requires a sympy expression (easy to obtain) and information about the resulting theano variables (simple). Theano variables have more information than SymPy variables do. In particular they have a dtype and a broadcastable . You're being asked to provide this.

Here is an example using sympy.stats and theano

In [1]: from sympy.stats import *
In [2]: from sympy import Symbol, pprint, pi
In [3]: import theano
In [4]: X = Normal('x', 2, 3)
In [5]: x = Symbol('x')
In [6]: expr = density(X)(x)
In [7]: pprint(expr)
               2
       -(x - 2) 
       ─────────
  ___      18   
╲╱ 2 ⋅ℯ         
────────────────
        ___     
    6⋅╲╱ π      

In [8]: from graph_translation import sympy_to_theano

In [9]: var_map = {'x': ('float64', (False, False))}  # x is a rank 2 tensor of floats
In [10]: texpr = sympy_to_theano(expr, var_map)  # oops, SymPy.pi doesn't have a Theano analog
KeyError: <class 'sympy.core.numbers.Pi'>

In [12]: expr = expr.subs(pi, pi.evalf())  # replace Pi with the float equivalent in SymPy
In [13]: texpr = sympy_to_theano(expr, var_map)  # try this again

In [14]: theano.printing.debugprint(texpr)
Elemwise{mul,no_inplace} [@A] ''   
 |InplaceDimShuffle{x,x} [@B] ''   
 | |TensorConstant{0.0940315946937} [@C]
 |InplaceDimShuffle{x,x} [@D] ''   
 | |Elemwise{pow,no_inplace} [@E] ''   
 |   |TensorConstant{2} [@F]
 |   |TensorConstant{0} [@G]
 |Elemwise{exp,no_inplace} [@H] ''   
   |Elemwise{mul,no_inplace} [@I] ''   
     |InplaceDimShuffle{x,x} [@J] ''   
     | |TensorConstant{-1} [@K]
     |Elemwise{pow,no_inplace} [@L] ''   
       |Elemwise{add,no_inplace} [@M] ''   
       | |InplaceDimShuffle{x,x} [@N] ''   
       | | |TensorConstant{-2} [@O]
       | |x [@P]
       |InplaceDimShuffle{x,x} [@Q] ''   
         |TensorConstant{2} [@R]
jsalvatier commented 11 years ago

It looks like I can just say:

distribution = Normal(0,1)
sympy_to_theano(distribution.pdf(x))

I'm curious whether that interacts well with things like indexing. For example if I say

mu = Symbol('mu')
distribution = Normal(mu[1:10], 1)
sympy_to_theano(distribution.pdf(x))

Whether this will work well.

mrocklin commented 11 years ago

In your example above is mu intended to be a scalar or a vector?

In SymPy Normal returns a scalar normal random variable. There was work on multivariate normal random variables a while ago but this isn't in the main branch.

mrocklin commented 11 years ago

Notes on my example above. You would want to extend the mapping variable in the theano_to_sympy to handle the SymPy.numbers.Pi class. This is likely to happen for other classes too. For example Theano may not have a Gamma class or other such mathematical functions.

mrocklin commented 11 years ago

The example above doesn't really use SymPy for anything other than a database of distributions. The random variable formalism allows the creation of some interesting problems.

jsalvatier commented 11 years ago

I meant mu to be a vector. It's okay that Normal is the univariate distribution in the sense that it doesn't deal with covariances. Can it describe many random variables that are each normally distributed with different means and variances (specified by a vector of means and variances)?

mrocklin commented 11 years ago

For this I would use a collection of Normals. They can be manipulated as normal symbolic variables to form products, tuples, etc....

>>> from sympy import *
>>> from sympy.stats import *
>>> xs = symbols('x:10')
>>> mus = symbols('mu:10')
>>> Xs = [Normal(x, mu, 1) for x, mu in zip(xs, mus)]
>>> pprint(simplify(E(sum(X**2 for X in Xs))))
  2     2     2     2     2     2     2     2     2     2     
μ₀  + μ₁  + μ₂  + μ₃  + μ₄  + μ₅  + μ₆  + μ₇  + μ₈  + μ₉  + 10
mrocklin commented 11 years ago

Or if you wanted the full pdf of the joint probability space you could access it as follows

In [1]: from sympy.stats import *

In [2]: from sympy import *

In [4]: mus = symbols('mu:10', real=True)

In [5]: Xs = [Normal('x_%d'%i, mu, 1) for i, mu in enumerate(mus)]

In [6]: pprint(pspace(Tuple(*Xs)).pdf)
            2             2             2             2             2             2             2             2             2             2
 -(-μ₀ + x₀)   -(-μ₁ + x₁)   -(-μ₂ + x₂)   -(-μ₃ + x₃)   -(-μ₄ + x₄)   -(-μ₅ + x₅)   -(-μ₆ + x₆)   -(-μ₇ + x₇)   -(-μ₈ + x₈)   -(-μ₉ + x₉) 
 ────────────  ────────────  ────────────  ────────────  ────────────  ────────────  ────────────  ────────────  ────────────  ────────────
      2             2             2             2             2             2             2             2             2             2      
ℯ            ⋅ℯ            ⋅ℯ            ⋅ℯ            ⋅ℯ            ⋅ℯ            ⋅ℯ            ⋅ℯ            ⋅ℯ            ⋅ℯ            
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                       5                                                                   
                                                                   32⋅π                                                                    
jsalvatier commented 11 years ago

Okay, this makes pretty good sense to me.

Looks like Sympy could be good as a source of distribution math, but probably not a good replacement for the symbolic algebra part since it's not geared to be numpy like, which I think is an important feature.

So I think we'd probably use SymPy by

  1. converting theano variables to sympy symbols
  2. doing sympy calculations for distributions
  3. converting back to theano variables

What are other people's thoughts?

twiecki commented 11 years ago

@jsalvatier I don't see where we would need 1.

Couldn't the model specification consist of sympy symbols which then get translated to theano for computation?

Also, I think there is a non-trivial probability that this approach will lead to some unexpected benefits further down the line.

mrocklin commented 11 years ago

What numpy-like features do you need? SymPy has a dense Matrix type and a set of purely symbolic MatrixExpr types. Each of these are strictly rank 2 tensors. Support for higher rank tensors in SymPy exists but is poorly integrated.

SymPy.stats had a multivariate branch a while back that generated matrix expressions. I could resurrect it if there was a definite need. If you're interested I would look at this blogpost I recommend skipping over the beginning and just scroll down to where the code starts.

jsalvatier commented 11 years ago

Perhaps you're right.

I think the part that makes me nervous is that sympy wasn't designed to do computation, so there can be unknown pitfalls.

For example, does sympy have a notion of multidimensional arrays? Does it do broadcasting? How about indexing and advanced indexing into vectors? Those seem pretty important to me.

It's also adding another interface between packages, and thus more possibility for problems. Sympy computations need to convert pretty well into theano graphs.

mrocklin commented 11 years ago

Definitely you're correct that trying to represent computations with SymPy is a recipe for disappointment. Rather, SymPy is good for representing mathematical models. In my experience there is usually a clear distinction between the model and the computation. Does such a distinction exist in your application?

You're definitely also correct that adding SymPy will bring a bunch of issues with it. SymPy.stats really isn't battle-tested. I'm confident that if you go this route you'll expose bugs. I'm around to support those but it adds inertia to the development cycle.

jsalvatier commented 11 years ago

You're right that there is such a distinction, and I think the separation is pretty clean in pymc3 too.

Perhaps a better way to say it is that often things like indexing and advanced indexing come in at the model level here. For example, you will want to specify things like

group = [0,0,1,1,2, 0] x ~ Normal(0, 1, shape = (num_of_groups,) ) y ~ Normal(x[group], 1)

Meaning that the mean of the prior distribution for yi is the value of x{group_i}.

Often these are treated in a handwavy fashion in mathematics. For example: the notion of broadcasting is not a commonly discussed issue in math. So I would be somewhat surprised to find features like these to be well developed in sympy or to allow for easy implementation. I would love to be surprised though.

jsalvatier commented 11 years ago

Something along these lines does seem to exist in SymPy, but it doesn't look that well developed.

mrocklin commented 11 years ago

Yeah, I would steer clear of Indexed. I would look at Matrix if you don't need ranks greater than 2.

Your intuition is correct that SymPy would not cleanly allow that syntax. You would have to resort to list comprehensions, map, for-loops, etc....

Fancy indexing has been on our issues list for matrices for a while though. It's certainly conceivable that we would support this in the future. If you spoke up on the mailing list saying that you needed it for your project it's likely that someone would implement it relatively soon. We're in GSoC application mode right now so we have a number of aspiring contributors looking to pounce on things like this.

I wouldn't support adding a shape keyword to sympy.stats but you could wrap something fairly easily. The following SymPy code would serve as the basis for a decent recipe

In [1]: from sympy.stats import *
In [2]: groups = [0, 0, 1, 1, 2, 0]
In [3]: numgroups = len(set(groups))
In [4]: x = Matrix(1, numgroups, lambda i,j: Normal('x_%d'%j, 0, 1))
In [5]: x
Out[5]: [x₀  x₁  x₂  x₃  x₄]
In [6]: y = Normal(x[1, groups], 1)  # This fails due to lack of fancy indexing
IndexError: Invalid index a[[0, 0, 1, 1, 2, 0]]

In general though if you switch to SymPy as an modeling input language your interface will change. You'll get a huge amount of general mathematical formalism but will lose the specific features you've cooked into your own language. For some of those features you can add them to SymPy or ask the SymPy community to add them for you. I'm sure that some language features would be difficult to support though.

twiecki commented 11 years ago

Hm, the array broadcasting is a real issue.

One other idea: If we mainly just wanted to use sympy.stats for it's distributions. Would it be possible to borrow those and replace the sympy symbols with Theano ones (e.g. sympy.sin -> Theano.sin)?

mrocklin commented 11 years ago

The theano_sympy project and in particular the sympy_to_theano function was designed to handle that transformation. You may want to rewrite it but the core pieces for how to break down a sympy graph and how to construct a theano graph are there.

mrocklin commented 11 years ago

If you just wanted the distributions you might want to skip the random variable and probability space formalism. Functions like Normal provide a random variable. You probably would want the classes like NormalDistribution. They're less powerful but far simpler.

jsalvatier commented 11 years ago

With this approach we don't have to invest too much. We could even have some theano, some Sympy based distributions.

twiecki commented 11 years ago

@mrocklin I didn't find NormalDistribution, can you link me to it? Also, does Normal use NormalDistribution internally?

mrocklin commented 11 years ago

https://github.com/sympy/sympy/blob/master/sympy/stats/crv_types.py#L1632

NormalDistribution is a distribution and contains information like the pdf. It has the ability to compute expectations of expresssions, cdf, etc....

Normal is a random variable. It acts in every way like a SymPy variable and handles interaction with the rest of the SymPy language. Whenever you query a SymPy expression with random variables e.g. P(X>2*Y) then all of the distributions of all the random variabes within that expression are collected and considered collectively.

If you want SymPy to do statistical simplifications for you then use Normal. If you just want a database of distributions then use NormalDistribution.

Using Normal and SymPy gives you simplifications like the following

>>> X = Normal('x', 0, 1)
>>> Y = Normal('y', 0, 1)
>>> Z = Normal('z', 0, 1)
>>> density(X**2 + Y**2 + Z**2)
ChiSquaredDistribution(3)

But it would force you to use SymPy as your modeling language, which may not be appropriate.

twiecki commented 11 years ago

Oh, that's really neat! Although it doesn't seem to work on my install, the last call just returns a lambda with a nasty integral. Maybe a version issue (I'm running 0.7.2, this is sympy master branch I suppose?.

Do you think this could figure out conjugate priors? For example:

Bern(x | \theta) * Beta(\theta) / Integral(Bern(x | \lambda) * Beta(\lambda) d\lambda) = Beta()

Where the result is another Beta distribution with updated parameters.

There is an explicit example at https://en.wikipedia.org/wiki/Conjugate_prior

mrocklin commented 11 years ago

In principle, yes that should be solvable. In practice no, that is not yet solvable. I've never used sympy.stats with random parameters so this exposes some bugs which I'm now fixing. Obviously this is an important application though.

In general sympy.stats transforms random expressions like what you have above into integrals. Sympy.integrate is generally surprisingly good at solving integrals. Of course, there are limits. There is nothing stopping us from generating the relevant integrals; I can't speak to what will happen after that.

The statistical simplification is in a separate branch and not yet in master.

twiecki commented 11 years ago

OK, I see that it's not practical. But just to finish this line of thought. The trick with conjugate priors is that there is no integration required. Just simplifying the numerator and denominator one can see that this is again a beta distribution, but with different parameters. Not sure if that changes anything.

mrocklin commented 11 years ago

Mixing finite and continuous RVs is a bit buggy. Here is an example with strictly normal random variables.

>>> from sympy.stats import *
>>> from sympy import *
>>> mu = Normal('mu', 2, 3)
>>> x = Normal('x', mu, 1)
>>> y, z = Symbol('y'), Symbol('z')
>>> simplify(density(mu, Eq(x, y))(z))  # density of mu given x == y
            2                2           
         9⋅y          y   5⋅z    2⋅z   1 
       - ──── + y⋅z - ─ - ──── + ─── - ──
  ___     20          5    9      9    45
╲╱ 5 ⋅ℯ                                  
─────────────────────────────────────────
                     ___                 
                 3⋅╲╱ π                  

Internally it sets up and solves the following integral

In [17]: simplify(density(mu, Eq(x, y), evaluate=False)(z))  # density of mu given x == y
Out[17]: 
 oo                                                               
  /                                                               
 |                                                                
 |                      2          2                              
 |               (x - z)    (z - 2)                               
 |             - -------- - --------                              
 |                  2          18                                 
 |            e                     *DiracDelta(x - y)            
 |  ----------------------------------------------------------- dx
 |        oo  oo                                                  
 |         /   /                                                  
 |        |   |                                                   
 |        |   |            2          2                           
 |        |   |     (x - z)    (z - 2)                            
 |        |   |   - -------- - --------                           
 |        |   |        2          18                              
 |        |   |  e                     *DiracDelta(x - y)         
 |  6*pi* |   |  ---------------------------------------- dx dz   
 |        |   |                    6*pi                           
 |        |   |                                                   
 |       /   /                                                    
 |       -oo -oo                                                  
 |                                                                
/                                                                 
-oo                                                               

This is all in a branch.

mrocklin commented 11 years ago

This is supported in https://github.com/mrocklin/sympy/tree/random-parameters which is now in a PR to sympy/master

mrocklin commented 11 years ago

OK, I see that it's not practical. But just to finish this line of thought. The trick with conjugate priors is that there is no integration required. Just simplifying the numerator and denominator one can see that this is again a beta distribution, but with different parameters. Not sure if that changes anything.

Of how much computational value are things like this? They're doable.

Looking a bit more into this it seems that for maximum likelihood the denominator with the integrals isn't necessary but that simplification of the derivative of the logarithm of the numerator might yield some substantial gains. I suspect that SymPy might be useful there even without conjugate prior tricks. Have you looked at the expressions you produce at this stage?

jsalvatier commented 11 years ago

The numerator doesn't come into it for most of what we're doing. The package is focused around Markov Chain Monte Carlo, which only needs to evaluate the log of something proportional to the density for the posterior. Thus, we can ignore normalizing constants for the most part. This also applies to maximum likelihood estimation.

mrocklin commented 11 years ago

Is it correct to assume you meant to say "The denominator doesn't come into it" ? Otherwise I'm more confused than I thought :)

I suspect that the d/dtheta (log(p(x | theta) * p(theta)) term might be algebraically simplifiable. Is the cost of your computation sensitive to the complexity of this term?

jsalvatier commented 11 years ago

Heh, yep. Looks like I was confused and not you!

Yes simplifying either the log posterior (logp) or the derivative (dlogp) would be useful. My impression is that right now the dlogp takes longer to compute than logp, but not by a lot.

On Thu, Feb 21, 2013 at 4:38 PM, Matthew Rocklin notifications@github.comwrote:

Is it correct to assume you meant to say "The denominator doesn't come into it" ? Otherwise I'm more confused than I thought :)

I suspect that the d/dtheta (log(p(x | theta) * p(theta)) term might be algebraically simplifiable. Is the cost of your computation sensitive to the complexity of this term?

— Reply to this email directly or view it on GitHubhttps://github.com/pymc-devs/pymc/issues/178#issuecomment-13922311.

mrocklin commented 11 years ago

This is obviously not my field but I've been playing with this a while. Here is a problem with a Poisson distribution whose rate parameter is distributed with a Beta distribution.

In this particular case it looks like dlogp is cheaper than logp.

In [1]: from sympy import *
In [2]: from sympy.stats import *
In [3]: 
In [3]: x = Symbol('x', positive=True)
In [4]: l = Symbol('lambda', positive=True)
In [5]: 
In [5]: rate = Beta(l, 2, 3)
In [6]: X = Poisson(x, rate)
In [7]: numer = density(X, rate)(X.symbol) * density(rate)(rate.symbol)
In [8]: numer
Out[8]: 
      x         2  -λ
12⋅λ⋅λ ⋅(-λ + 1) ⋅ℯ  
─────────────────────
          x!         
In [9]: log(numer)
Out[9]: 
   ⎛      x         2  -λ⎞
   ⎜12⋅λ⋅λ ⋅(-λ + 1) ⋅ℯ  ⎟
log⎜─────────────────────⎟
   ⎝          x!         ⎠

In [10]: simplify(log(numer))
Out[10]: 
                            ⎛ 2          ⎞          
                            ⎜λ  - 2⋅λ + 1⎟          
-λ + x⋅log(λ) + log(λ) + log⎜────────────⎟ + log(12)
                            ⎝     x!     ⎠          

In [11]: simplify(log(numer).diff(l))
Out[11]: 
   2                    
- λ  + λ⋅x + 4⋅λ - x - 1
────────────────────────
       λ⋅(λ - 1)        

If this is interesting at all let me know. If you can provide a more likely example (that doesn't include array broadcasting) let me know.

mrocklin commented 11 years ago

It looks like the fully general solution to Poisson parametrized by a Beta is actually pretty simple

In [25]: rate = Beta(l, a, b)

In [26]: X = Poisson(x, rate)

In [27]: numer = density(X, rate)(X.symbol) * density(rate)(rate.symbol)

In [28]: simplify(log(numer).diff(l))
Out[28]: 
                 2                  
a⋅λ - a + b⋅λ - λ  + λ⋅x - λ - x + 1
────────────────────────────────────
             λ⋅(λ - 1)              
In [29]: simplify(log(numer).diff(l)).count_ops()
Out[29]: 14

In [30]: print ccode(simplify(log(numer).diff(l)))
(a*lambda - a + b*lambda - pow(lambda, 2) + lambda*x - lambda - x + 1)/(lambda*(lambda - 1))
twiecki commented 11 years ago

I suppose the question is which (implicit) solution Theano would come up with using autodiff. @jsalvatier is there any way to get the complexity of that for this case?

jsalvatier commented 11 years ago

I get the not very informative result of:

'Flatten{1}(Elemwise{Composite{[Composite{[Composite{[Composite{[Composite{[sub(add(i0, i1), add(i2, i3))]}(Switch(i0, i1, i2), Switch(i3, i4, i2), Switch(i0, i5, i2), Switch(i3, i6, i7))]}(i0, true_div(i1, i2), i3, i4, true_div(i5, i2), true_div(i6, i7), i8, i9)]}(i0, add(i1, i2), i3, i4, i5, i6, add(i1, i7), sub(i8, i3), i8, i9)]}(Composite{[Composite{[Composite{[Composite{[AND(AND(i0, i1), GT(i2, i3))]}(AND(i0, i1), GT(i2, i3), i4, i3)]}(AND(i0, i1), LE(i2, i0), i3, i4, i5)]}(i0, GE(i1, i2), i1, i3, i2, i4)]}(i0, i1, i2, i3, i4), i5, i3, i1, i2, Composite{[AND(i0, GT(i1, i2))]}(i0, i1, i2), i6, i4, i7, i8)]}}(TensorConstant{1}, l, TensorConstant{0}, a, b, TensorConstant{-1.0}, x, TensorConstant{1.0}, TensorConstant{0.0}))'

I'm working on graphing this

On Fri, Feb 22, 2013 at 4:39 AM, Thomas Wiecki notifications@github.comwrote:

I suppose the question is which (implicit) solution Theano would come up with using autodiff. @jsalvatier https://github.com/jsalvatier is there any way to get the complexity of that for this case?

— Reply to this email directly or view it on GitHubhttps://github.com/pymc-devs/pymc/issues/178#issuecomment-13941294.

mrocklin commented 11 years ago

Try theano.printing.pydotprint

mrocklin commented 11 years ago

Also make sure that this comes after you've run all of the optimizations. Theano does simplification too, it just tends not to be quite as algebraically focused. The syntactically easiest way to do this it to get the fgraph object after it has been made into a function.

from theano import *
x = tensor.matrix('x')
y = 3+ x**2 + x + 1
f = function([x], y)
theano.printing.debugprint(f.maker.fgraph)
mrocklin commented 11 years ago

The awesomest solution would be to use SymPy to generate the algebraic result and simplify algebraically, then print to Theano and use Theano to optimize numerically (e.g. inplace operations, constant folding). They optimize in very different ways and it'd be great to have a practical example where they work together. This was the original motivation behind the theano_sympy code sprint.

jsalvatier commented 11 years ago

That's actually what I was thinking too. I'm still having trouble with pydot :(

nouiz commented 11 years ago

You only need call pydotprint and debugprint like this:

f = theano.function(...)
theano.printing.debugprint(f)
theano.printing.pydotprint(f)

Also, to disable the fusion of elemwise to see more clearly the operation done, you can use this theano flags:

optimizer_excluding=local_elemwise_fusion

mrocklin commented 11 years ago

Any development on this? If you provide comparison pymc code I can handle the Theano analysis side.

jsalvatier commented 11 years ago

Hey, I've been super busy, so I haven't done work on this, but I do have an example.

from pymc import * 
from pymc.model import Variable 

m = Model()
a = Variable('a', (), 'float64', 1)
b = Variable('b', (), 'float64', 1) 
m.vars.append(a)
m.vars.append(b)

l = m.Var('l', Beta(a,b), ())
x = m.Var('x', Poisson(l), (), testval = 1)

lp0 = logp(m)
dlp0 = dlogp([l])(m)

lp1 = m.logp().f
dlp1 = m.dlogp([l]).f

from theano.printing import pydotprint 
from theano import * 
pydotprint(dlp1, 'dlogp.png')

pp(dlp1.maker.fgraph.outputs[0])

Gives

dlogp

and

'Flatten{1}(Elemwise{Composite{[Composite{[Composite{[Composite{[Composite{[sub(add(i0, i1), add(i2, i3))]}(Switch(i0, i1, i2), Switch(i3, i4, i2), Switch(i0, i5, i6), Switch(i3, i7, i2))]}(i0, true_div(i1, i2), i3, i4, true_div(i5, i2), i6, i7, true_div(i8, i9))]}(i0, i1, i2, i3, i4, add(i5, i6), i7, i8, add(i5, i9), sub(i7, i2))]}(Composite{[AND(i0, GT(i1, i2))]}(i0, i1, i2), i3, i1, i2, Composite{[Composite{[Composite{[Composite{[AND(AND(i0, i1), GT(i2, i3))]}(AND(i0, i1), GT(i2, i3), i4, i3)]}(AND(i0, i1), LE(i2, i0), i3, i4, i5)]}(i0, GE(i1, i2), i1, i3, i2, i4)]}(i0, i1, i2, i4, i5), i6, i4, i7, i8, i5)]}}(TensorConstant{1}, l, TensorConstant{0}, x, a, b, TensorConstant{-1.0}, TensorConstant{1.0}, TensorConstant{0.0}))'

I can translate these into Theano if that's better.

I'm thinking that this is complex enough that we probably wouldn't want to put this work into pymc3. If it's worth pursuing, I think it would be better to put it into Theano as better optimizations. That way other people can benefit anyway.

mrocklin commented 11 years ago

I'm thinking again about the Theano-SymPy-Theano crossover, particularly about what's possible and at what level of generality it should be done. I think that there is value here but negotiating the best crossover point is complex.

It's clear that, at least for this problem, a fair amount can be gained by involving sympy. The Theano result isn't very intuitive and appears to be a more complex solution. I assume that this complexity affects runtime.

One option is to take the result you present above, translate it to SymPy, see what it can do, and then translate back. This is good because it's general, applies to all of Theano, and doesn't require any legwork by the pymc folks. It's bad though because the form presented here is fairly low-level, potentially limiting SymPy effectiveness. In general Theano representations are mathematically slightly lower-level than SymPy. It feels weird to travel up the abstraction stack rather than down.

For this particular project I'm curious if I can start from the Normal, Beta, etc... layer of random variables. I wonder if it is reasonable to implement these as standard Theano Ops. That way there would be a clear high-level transition over to SymPy. It would also mean that there was a single point of truth for statistical information.

pymc presents a good example problem for me; it has clear issues and clear potential benefits. I was looking for a more complex example that involved array indexing (which is where I expect SymPy to break down). I tried downloading your repository and running your example but ran into a few issues. Probably I'm just doing something silly.

pymc$ ipython
In [1]: run examples/ARM5-4.py
/home/mrocklin/workspace/pymc/pymc/step_methods/hmc.py in <module>()
      5 ''' 
      6 from numpy import floor
----> 7 from quadpotential import *
      8 from utils import *
      9 from ..core import *

AttributeError: 'module' object has no attribute 'QuadPotential_SparseInv'

Ah, well, maybe it's just not installed properly

mrocklin@notebook:~/workspace/pymc$ python setup.py install
/home/mrocklin/Software/epd-7.3-2-rh5-x86_64/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
error: package directory 'pymc/history' does not exist

Any tips?

jsalvatier commented 11 years ago

I'm not sure why quadpotential is giving you a problem. I'll check that out.

The second problems aren't related to the first I don't think.

jsalvatier commented 11 years ago

I've solved the second problem I think, can you update and try again?

jsalvatier commented 11 years ago

For the first problem, it's related to not having https://pypi.python.org/pypi/scikits.sparse . You could install that, or I'll fix it in an hour or two.

mrocklin commented 11 years ago
mrocklin@notebook:~/workspace/pymc$ git pull 
 setup.py |    4 ++--

pymc$ ipython
In [2]: import pymc
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/mrocklin/workspace/pymc/<ipython-input-2-5f262cfcb99b> in <module>()
----> 1 import pymc

/home/mrocklin/workspace/pymc/pymc/__init__.py in <module>()
      5 from trace import *
      6 from sample import *
----> 7 from step_methods import *
      8 from tuning import *
      9 

/home/mrocklin/workspace/pymc/pymc/step_methods/__init__.py in <module>()
      1 from compound import *
----> 2 from hmc import *
      3 from metropolis import *
      4 from gibbs import *

/home/mrocklin/workspace/pymc/pymc/step_methods/hmc.py in <module>()
      5 ''' 
      6 from numpy import floor
----> 7 from quadpotential import *
      8 from utils import *
      9 from ..core import *

AttributeError: 'module' object has no attribute 'QuadPotential_SparseInv'

sudo easy_install scikits.sparse gives a gcc compilation error.

scikits/sparse/cholmod.c:278:33: fatal error: suitesparse/cholmod.h: No such file or directory