pmelchior / scarlet

hyperspectral galaxy modeling and deblending
MIT License
50 stars 22 forks source link

Unexpected downturn in log_likelihood? #273

Closed entaylor closed 1 year ago

entaylor commented 1 year ago

I had the idea of running the deblending optimisation for a long time as a convergence test; i.e.: blend.fit( 500, e_rel=1e-5 )

What has me baffled is that the loglikelihood peaks at a certain point and then goes down and down and down after that; see plot below: unexpected_logL

The good news is that the peak is essentially where the iteration cuts off using e_rel=1e-4, so i think (hope?) all my results are good.

It's a little bit surprising to me, but nonetheless reassuring, that the models at the end of the long iteration look fine to me by eye, despite the apparently terrible logL.

There are some oddities with this particular blend (which is why have been testing on it), including missing data in several images, so maybe that is a contributing factor.

I'm almost 100% sure that this must be an issue with how i am using/understanding scarlet, but i can't fathom what that might be. But even so, i would have thought that the optimiser should not allow logL to go down like this. Is it possibly an error in the logL reporting? But if that, why wouldn't that trigger the e_rel condition? Or is there something like a prior over and above the logL that is considered in the optimiser?

Apologies if this is something really simple, but would be very grateful for an explanation if so!

Just in case it gives some insight, here is one of the images i'm fitting, and the residuals for that image: image resid

pmelchior commented 1 year ago

I'm not entirely sure why the likelihood decreases, but I can offer a few clues:

  1. These multi-component proximal algorithms have no formal convergence guarantees. It's actually easy to choose step sizes large enough for them to blow up. Because the objective function is non-convex, there will also be many more saddle points than minima, where the trajectory can move very slowly until it gets going again. In your example, it looks like there are two strongly overlapping light sources in the center, and the fit ultimately ended up in an inferior position than what it had found earlier.
  2. The fitter doesn't terminate because of the e_rel criterion because it tests relative convergence of the parameters, not of the value of the objective function. As long as the parameters change, the fitter keeps going.
entaylor commented 1 year ago

Okay, cool; that makes total sense, thank you! So it is worth keeping track of niter, blend.log_likelihood.max(), and blend.log_likelihood[-1] as diagnostics for cases like this. Good to know!