Derive gradient instead of relying on numerical differentiation

harrispopgen / mushi

[mu]tation [s]pectrum [h]istory [i]nference

https://harrispopgen.github.io/mushi/

MIT License

24 stars 5 forks source link

Derive gradient instead of relying on numerical differentiation #3

Closed wsdewitt closed 5 years ago

wsdewitt commented 5 years ago

Also consider a fixed point iteration, as in DeWitt and Chu (2010), instead of L-BFGS-B.

wsdewitt commented 5 years ago

Note: this depends on #2, since the matrix calculus will look different

wsdewitt commented 5 years ago

Alternatively, it would be great if we could use autodiff as in the fastNeutrino paper (Bhaskar et al.), since analytical derivatives will become unwieldy if we use a fancy weighting scheme on the penalty where weights depend on the expected coalescent tree height (and thus on eta(t)).

kamdh commented 5 years ago

Gradient computations Sum over i when computing the loss (the Poisson log-likelihood) and you get the gradient vector

wsdewitt commented 5 years ago

Nice. Here's my fixed point iterator for the Bregman regularized likelihood. J is the Jacobian of c.

wsdewitt commented 5 years ago

And zooming out to the whole page, it looks like your derivatives of c look like mine from last week. Yay validation!

wsdewitt commented 5 years ago

I started working on implementing this last week too, but it got kinda abandoned. See here: https://github.com/WSDeWitt/dement/blob/62ebee166cdd3e8abf4f06017523054634abd2d2/dement.py#L76-L82

kamdh commented 5 years ago

Ya I saw, but it's pretty hard (for me) to read. I'd write it first with loops to make sure it's doing what you want then maybe optimize for speed. Newer python is actually okay with loops anyway.

wsdewitt commented 5 years ago

Yeah it was failing np.check_grad and I had felt like playing so went with numerical diff. Validating with loopy version is a good idea.

wsdewitt commented 5 years ago

Another approach to consider is the adjoint-state method. Apparently this is a common technique in nonlinear inverse problems that allows computation of the loss function's gradient without ever forming Frechet derivatives / Jacobians (scales better in n, I guess).

kamdh commented 5 years ago

We can get the exact jacobian impelmented, shouldn't be a problem.