automatic differentiation

wsdewitt commented 7 years ago

Tensorflow, or python's ad module, can be used to define gradients of our likelihood automatically. This will make it easier to interchange models, since we won't have to supply (error-prone) analytic gradients.

How will this work of our memoized likelihood functions? We have a memo dict for the values, not the gradients.

matsen commented 7 years ago

Moving discussion here rather than on the commit.

What about reparameterizing in terms of a free variable? If it's a positivity constraint one could use log (are we going to have to use log anyway to avoid underflow?). Or as suggested in the thread below one could just square the variable. This doesn't actually impose a major burden because the variable transformation just happens once (whereas your recursion happens many times).

I had a look at stan, and the math library looks fine and all, but stan itself doesn't implement constrained optimization AFAICT. Here is an interesting thread with Dr. Gelman himself.

But we shouldn't push this too hard-- the original idea is to make things easy.

I'm a little resistant to the ad module because it's pure python, and thus probably not too speedy. What about just using numerical optimization without gradients?

wsdewitt commented 7 years ago

In tensorflow, unconstrained optimization of the branching probability takes a few reasonable steps, but then jumps outside the unit interval and barfs. I did try a logit transform, but that never takes even one step (I don't yet get why).

matsen commented 7 years ago

It'd be a lot easier if you would push the code throwing the errors. It seems to me that you're trying to use the same memoization-in-dict thing, but you need to use the tf data structures to store things (but hard to tell given the pushed code).

wsdewitt commented 7 years ago

We have analytic derivatives, which seem to work fine for now. TF worked too eventually, but was mega slow.

matsengrp / gctree

automatic differentiation #9