pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.59k stars 1.98k forks source link

Leveraging SymPy for PyMC3 #178

Closed twiecki closed 11 years ago

twiecki commented 11 years ago

SymPy (http://sympy.org/en/index.html) is a Python library for symbolic mathematics.

My initial motivation for looking at SymPy resulted from #172 and #173. Instead of recoding all probability distributions, samplers etc in Theano, maybe we could just use the ones provided by sympy.stats (http://docs.sympy.org/dev/modules/stats.html).

For this to work we needed to convert the sympy computing graph to a theano one. It seems that there is some work that shows that this is possible (https://github.com/nouiz/theano_sympy)

Looking at sympy (and sympy.stats) more closely it seems that there are potentially more areas where integrating this could help. Maybe this would give the best of both worlds: "Theano focuses more on tensor expressions than Sympy, and has more machinery for compilation. Sympy has more sophisticated algebra rules and can handle a wider variety of mathematical operations (such as series, limits, and integrals)."

There is additional discussion here: https://github.com/nouiz/theano_sympy/issues/1.

Copy pasting some chunks from @mrocklin response to move the discussion over here:

Overlap

There are some obvious points of overlap between the various projects

What is the relationship with statsmodels? They also have a home-grown internal algebraic system. My guess is that if everyone were to unite under one algebraic system there would be some pleasant efficiencies. I obviously have a bias about what that algebraic system should be :)

Derivatives

Both Theano and SymPy provide derivatives which, apparently, you need. SymPy provides analytic ones, Theano provides automatic ones. My suggestion would be to use SymPy if it works and fall back on Theano if it doesn't work. You don't need SymPy.stats for this (in case you didn't want to offload your distributions work.) SymPy.core would be just fine.

Other benefits

In general the benefits to using symbolic systems tend to be unexpected. SymPy can provide lots of general aesthetic fluff like awesome pretty printing, symbolic simplification, C/Fortran code snippet generation, etc....

jsalvatier commented 11 years ago

I think I've fixed the problem with importing (it did something conditional on scikits.sparse being available).

mrocklin commented 11 years ago
In [1]: import pymc
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/home/mrocklin/workspace/pymc/<ipython-input-1-5f262cfcb99b> in <module>()
----> 1 import pymc

/home/mrocklin/workspace/pymc/pymc/__init__.py in <module>()
      6 from sample import *
      7 from step_methods import *
----> 8 from tuning import *
      9 
     10 from debug import *

/home/mrocklin/workspace/pymc/pymc/tuning/__init__.py in <module>()
      1 from starting import find_MAP
----> 2 from scaling import approx_hess

/home/mrocklin/workspace/pymc/pymc/tuning/scaling.py in <module>()
      4 @author: johnsalvatier
      5 '''
----> 6 import numdifftools as nd
      7 import numpy as np
      8 from ..core import *

ImportError: No module named numdifftools

By the way I'm running on Ubuntu 12.04 with the Enthought Python Distribution 7.3.2

mrocklin commented 11 years ago

If I pip install numdifftools on my own I can import pymc cleanly. I rarely use setup.py. Does it handle dependencies?

jsalvatier commented 11 years ago

Yes it does, and it's already listed there. I'm not sure what would cause that.

requires=['theano','numpy','scipy','numdifftools']

(Maybe I should also list pandas since the examples that load external data require it)

jsalvatier commented 11 years ago

I had been using distutils to do the install, but I switched to using setuptools. Then if you tell pip to install from source, it seems to work better:

sudo pip install pymc3/

mrocklin commented 11 years ago

SymPy now has a fairly natural Theano printer which supports dimensionality.

In [1]: from sympy.stats.crv_types import ExponentialDistribution
In [2]: from sympy.printing.theanocode import theano_code, theano
In [3]: from sympy import *
In [4]: rate = Symbol('lambda', positive=True)
In [5]: x = Symbol('x', real=True)
In [6]: ExponentialDistribution(rate) # This is a SymPy object
Out[6]: ExponentialDistribution(lambda)
In [7]: ExponentialDistribution(rate)(x) # This is a SymPy expression
Out[7]: 
   -λ⋅x
λ⋅ℯ    
In [8]: theano_code(ExponentialDistribution(rate)(x))  # This is a Theano var
Out[8]: Elemwise{mul,no_inplace}.0

In [10]: theano_code(ExponentialDistribution(rate)(x), broadcastables={x: (False,), rate: (True,)})  # This is a Theano tensor var
Out[10]: Elemwise{mul,no_inplace}.0

In [11]: theano.printing.debugprint(_)
Elemwise{mul,no_inplace} [@A] ''   
 |lambda [@B]
 |Elemwise{exp,no_inplace} [@C] ''   
   |Elemwise{mul,no_inplace} [@D] ''   
     |InplaceDimShuffle{x} [@E] ''   
     | |TensorConstant{-1} [@F]
     |x [@G]
     |lambda [@B]

I'm not sure if this is of any use to you all (I suspect that you've moved beyond this idea). I'm not sure how this would be integrated into your system but it does supply a clean transition and opens up the possibility to use SymPy's simplification (both stats specific and general algebraic simplification).

In [20]: simplify(log(ExponentialDistribution(rate)(x)))
Out[20]: -λ⋅x + log(λ)

In [28]: theano.printing.debugprint(theano_code(_, broadcastables={x: (False,), rate: (True,)}))
Elemwise{add,no_inplace} [@A] ''   
 |Elemwise{mul,no_inplace} [@B] ''   
 | |InplaceDimShuffle{x} [@C] ''   
 | | |TensorConstant{-1} [@D]
 | |x [@E]
 | |lambda [@F]
 |Elemwise{log,no_inplace} [@G] ''   
   |lambda [@F]

I recently wrote about the benefits of SymPy and Theano integration here http://matthewrocklin.com/blog/work/2013/03/19/SymPy-Theano-part-1/ http://matthewrocklin.com/blog/work/2013/03/28/SymPy-Theano-part-2/

If we can find a motivating use case I'd be to support it from the SymPy and Theano ends.

jsalvatier commented 11 years ago

This is pretty cool Matthew. I think we won't use this right now, but it might come in handy in the near future.

twiecki commented 11 years ago

I agree. It also seems like Theano is working on the Theano -> SymPy conversion which would be easier to use in pymc3 for e.g. better simplification (as your blog post clearly shows).

It would be easy to enough to tie in SymPy on a per-need basis now that one can do the SymPy -> Theano conversion if, for example, someone requires a distribution that's only present in there. Maybe I can cook up an example.

nouiz commented 11 years ago

I don't know where you take that we are working on this. I just made a ticket to don't forget it. I didn't heard anyone telling he will work on that and I have other thing to do before.

twiecki commented 11 years ago

My bad. Let me restate that there is a chance this will be possible in Theano in the unspecified future.