Closed wsdewitt closed 5 years ago
Note: this depends on #2, since the matrix calculus will look different
Alternatively, it would be great if we could use autodiff as in the fastNeutrino paper (Bhaskar et al.), since analytical derivatives will become unwieldy if we use a fancy weighting scheme on the penalty where weights depend on the expected coalescent tree height (and thus on eta(t)).
Gradient computations Sum over i when computing the loss (the Poisson log-likelihood) and you get the gradient vector
Nice. Here's my fixed point iterator for the Bregman regularized likelihood. J is the Jacobian of c.
And zooming out to the whole page, it looks like your derivatives of c look like mine from last week. Yay validation!
I started working on implementing this last week too, but it got kinda abandoned. See here: https://github.com/WSDeWitt/dement/blob/62ebee166cdd3e8abf4f06017523054634abd2d2/dement.py#L76-L82
Ya I saw, but it's pretty hard (for me) to read. I'd write it first with loops to make sure it's doing what you want then maybe optimize for speed. Newer python is actually okay with loops anyway.
Yeah it was failing np.check_grad
and I had felt like playing so went with numerical diff. Validating with loopy version is a good idea.
Another approach to consider is the adjoint-state method. Apparently this is a common technique in nonlinear inverse problems that allows computation of the loss function's gradient without ever forming Frechet derivatives / Jacobians (scales better in n, I guess).
We can get the exact jacobian impelmented, shouldn't be a problem.
Also consider a fixed point iteration, as in DeWitt and Chu (2010), instead of L-BFGS-B.