pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.61k stars 1.99k forks source link

Variational inference with AD? #708

Closed datnamer closed 8 years ago

datnamer commented 9 years ago

Can the theano infrastructure handle this?

https://github.com/stan-dev/stan/pull/1421 http://andrewgelman.com/2015/02/18/vb-stan-black-box-black-box-variational-bayes/

datnamer commented 8 years ago

Are those feature constraints, coding constraints or both? My dream is to have AD for a combination of numba and the new numpy successor dynamic array library Dynd with missing data, user defined types etc

Numba is already working well with Dynd, and more support is planned iirc

I think if we give it a big push in pydata, this will be big for the ecosystem.

I opened an issue here: https://github.com/HIPS/autograd/issues/51

Any help to get this some momentum would be awesome.

fonnesbeck commented 8 years ago

Sort of both, but I was thinking primarily of semantic constraints, such as he inability to write loops. Theano is an additional layer that PyMC3 users have to deal with in order to build models.

datnamer commented 8 years ago

I think cgt is looking at allowing loops.

twiecki commented 8 years ago

While these are interesting propositions I think that would be more for pymc4 and we should focus on getting pymc3 out the door.

mjwillson commented 8 years ago

@datnamer Wow I wasn't aware of autograd. That's like dark magic :)

I guess it's probably not as fast as Theano (doesn't seem to do any symbolic graph-rewriting optimisations, won't compile new kernels for you, ...) but maybe it's worth it for the simplicity, and it does seem to have support for gpuarray.

I wasn't aware of CGT (http://rll.berkeley.edu/cgt/) either, this looks very neat as a better Theano. Anyone using it / have a feel on how mature it is?

(Sorry getting slightly off-topic)

datnamer commented 8 years ago

@twiecki if cgt is supposed to be almost a drop in theano replacement. ..maybe it could go in pymc 3?

twiecki commented 8 years ago

The problem I see with autograd is that it will be very slow as it's just using numpy. I hope they explore numba to speed things.

The problem I see with cgt is that it's still a young project. Last time I checked it still lacked features that would not make it a drop-in replacement and I expect the cost of changing the backend to be quite high (even if it doesn't appear to be too bad on first sight). And it doesn't really solve any problems -- users would just have to learn cgt instead of theano.

At this point, we are really close to have something quite powerful and usable built on Theano. Putting on the finishing touches will be a much better ROI.

akucukelbir commented 8 years ago

a bit late to the discussion, but i think all of this is very exciting stuff.

datnamer commented 8 years ago

@twiecki that makes sense. There is also this package that compiles numpy to theano: https://github.com/LowinData/pyautodiff

But i don't see that it can handle loops (I'll ask).

@akucukelbir I'm happy that you are involved and following!

syclik commented 8 years ago

+1 to @akucukelbir

@datnamer, it looks like pyautodiff is a misnomer. One of the reasons it's going to have trouble with loops is that theano is symbolic differentiation, not automatic differentiation. (with enough restrictions, loops are fine, but in general, it's going to be difficult to symbolically differentiate)

datnamer commented 8 years ago

Makes sense. I wonder if there is a way to compile loops to theano's scan.

@syclik seems like a decent drawback. Are there any benefits to symbolic diff vs auto diff?

mjwillson commented 8 years ago

@datnamer From what I can tell the benefit of autodiff is that the expression graph is constructed on-the-fly, meaning you can use loops and control flow without having to express them symbolically.

That could make data-dependent control flow a lot more natural, but could still have pitfalls if you make control flow decisions based on parameters (it can't magically backpropagate the error past non-differentiable control flow operations -- and in practise it would only know about the one code path that the forwards evaluation went down)

Autodiff is probably more limited in terms of graph optimisations too, since you don't have the expression graph upfront, although some clever just-in-time stuff might be possible.

syclik commented 8 years ago

@mjwillson, that's exactly right. Since the expression graph is constructed for each evaluation, autodiffing an algorithm that has different branching behavior from run to run is possible.

Regarding differentiating past non-differentiable operations, that just doesn't work (for math... the rest follows).

Regarding graph optimizations: that's correct. With symbolic differentiation operating on a static expression graph, you can do some really neat optimization. This limits what you can express in symbolic differentiation, but I'd buy the argument that maybe you can restructure what you need into a static expression. With automatic differentiation, you're not guaranteed that the expression graph that's generated for a particular execution is static from run to run. Of course, in most circumstances, it is, so someone really clever could do something just-in-time. If you wanted to restrict the expressiveness of autodiff to guarantee a static expression graph, then you should just use symbolic differentiation.

twiecki commented 8 years ago

This is implemented now.