nouiz / theano_sympy

Function to transform theano graph <-> sympy graph.
7 stars 4 forks source link

Current status? #1

Closed twiecki closed 11 years ago

twiecki commented 11 years ago

This looks very interesting. Is this still maintained?

Also, how close is this to actually use it to transform more complex computational graphs between the two packages?

What are future steps?

EDIT: The SymPy -> Theano conversion is implemented and merged into SymPy proper under sympy.printing.theanocode

See the following pull request and blogposts for details on use

sympy/sympy#1905 http://matthewrocklin.com/blog/work/2013/03/19/SymPy-Theano-part-1/ http://matthewrocklin.com/blog/work/2013/03/28/SymPy-Theano-part-2/

mrocklin commented 11 years ago

Also, how close is this to actually use it to transform more complex computational graphs between the two packages?

There are a few fundamental issues for complex graphs

  1. SymPy's basic data structure is a tree, not a DAG
  2. Not all node types coexist in both languages. E.g. SymPy has an Integral type, Theano has a GEMM type.

My recollection was that this could convert trees with only shared types relatively well. The goal was for it to provide mathematical simplifications like sin(x)**2 + cos(x)**2 -> for Theano. I believe that this was achieved.

The primary future step would be to actually integrate it into Theano. @nouiz would know more about that status and requirements of this.

I suspect that this is stalled due to lack of interest and a definite application. If you can supply either then I'd be happy to support from the SymPy side.

twiecki commented 11 years ago

@mrocklin Just the guy I wanted to talk to :).

We are currently working on pymc3 which improves on pymc2 by using Theano and implements Hamiltonian MC (which requires the gradient). Theano is great for most of that. However, it is not really designed with statistics in mind. For example, there are no likelihood functions.

So one idea would be to leverage sympy.stats to provide these functions and then convert them to Theano.

Another idea would be to have sympy do all of the work in figuring out the gradients and only convert the final product to Theano and use it only as a computation engine.

While I have you, maybe other things could be done with sympy.stats as well. E.g. automatic derivation of conditional distributions for Gibbs sampling. Automatically integrating out certain variables (e.g. a collapsed Gibbs sampler for GMMs). Would something like that be possible?

mrocklin commented 11 years ago

Looking at pymc it feels like you have a definite application for the SymPy -> Theano bridge. To me it looks like you've implemented parts of SymPy and parts of Theano in pymc2 and are looking to offload this work onto other packages while developing pymc3.

The benefits to using Theano are clear. It sounds like you're querying about the benefits to using SymPy. I'm happy to serve as sounding board for that.

Some thoughts:

Overlap

There are some obvious points of overlap between the various projects

  1. PyMC has distributions and SymPy has distributions. SymPy doesn't currently have infinite discrete random variables like Poisson though. This could be fixed but is a current failing. SymPy's support for analytic solution of infinite sums is poor so this was a low priority. It seems like you're not really looking for that though.
  2. PyMC has implemented some special functions that could be in Theano. I would encourage you to push these upstream. They'll probably get some useful attention from the Theano crowd.
  3. Looking at the pymc2 readme it appears that you have created some sort of symbolic algebra class structure (you add two pymc.Normal objects). Presumably SymPy.core might be of use here.

What is the relationship with statsmodels? They also have a home-grown internal algebraic system. My guess is that if everyone were to unite under one algebraic system there would be some pleasant efficiencies. I obviously have a bias about what that algebraic system should be :)

Derivatives

Both Theano and SymPy provide derivatives which, apparently, you need. SymPy provides analytic ones, Theano provides automatic ones. My suggestion would be to use SymPy if it works and fall back on Theano if it doesn't work. You don't need SymPy.stats for this (in case you didn't want to offload your distributions work.) SymPy.core would be just fine.

Other benefits

In general the benefits to using symbolic systems tend to be unexpected. SymPy can provide lots of general aesthetic fluff like awesome pretty printing, symbolic simplification, C/Fortran code snippet generation, etc....

Automatic conditional distributions

You ask about these. To be honest I'm not very familiar with sampling, I'm an amateur statistician at best. SymPy.stats can create integrals for conditional distributions. SymPy.core can sometimes integrate them successfully. Here is a stock example. If you're able to describe what you're looking for in small simple words I can probably be of more help.

https://gist.github.com/mrocklin/4981811

I would also be curious in how you're using BLAS and LAPACK (I see that these source files are in your repo). Perhaps that conversation should move elsewhere though.

twiecki commented 11 years ago

@mrocklin Thanks for the insightful response! I created a pymc ticket, maybe we can move the discussion over there?

https://github.com/pymc-devs/pymc/issues/178

nouiz commented 11 years ago

Hi,

I'm interested in collaboration between SymPy and Theano, but for some times already, I can't spend times coding this. But I'll be very happy to guide/discuss/help someone continuing this effort. Our lab do not need this integration right now, that is why there is other priority that prevent me to work on that.

About the special function, the gammaln and psi are not in Theano, but not the FactLn opt. A PR to include it in Theano would be welcome :)

I'd love to see Theano as a code generator for SymPy. I'll help as my time permit. I'm watching the ticket pymc-devs/pymc#178, so if there is something that pop up, I'll see it.

Also, SymPy is getting more "tensor" support. I don't know the detail, but they now have vector/matrix support and where discussing tensor support. As @mrocklin told, the current framework here work well if the full graph can be converted from SymPy to/from Theano. If that is not the case, the simplest fix would be to add the missing Op in the project. We tried rapidly doing a partial mapping from one to the other and we hit not easy corner case.

mrocklin commented 11 years ago

Regarding the original question about current status. The SymPy -> Theano conversion is implemented and merged into SymPy proper under sympy.printing.theanocode

See the following pull request and blogposts for details on use

https://github.com/sympy/sympy/pull/1905 http://matthewrocklin.com/blog/work/2013/03/19/SymPy-Theano-part-1/ http://matthewrocklin.com/blog/work/2013/03/28/SymPy-Theano-part-2/

nouiz commented 11 years ago

I close this issues as it is now integrated in SymPy.

mrocklin commented 11 years ago

The reverse transformation is not yet complete. Theano should be able to convert purely elemwise graphs into sympy expressions. On Mar 29, 2013 7:24 AM, "nouiz" notifications@github.com wrote:

I close this issues as it is now integrated in SymPy.

— Reply to this email directly or view it on GitHubhttps://github.com/nouiz/theano_sympy/issues/1#issuecomment-15642835 .

nouiz commented 11 years ago

I agree, the this ticket was about the status of the project. That is why I closed it. I made an issue in Theano about it:

https://github.com/Theano/Theano/issues/1313