Improving the joint position fit (pointing + SN)

Background

The final step of CubeFit is, having finished determining the galaxy model (based on final references), to fit the pointing of each epoch with SN light as well as the position of the SN relative to the galaxy model. The galaxy model remains fixed in this process. This is a 2 * nepochs + 2 parameter fit where nepochs is the number of epochs with SN light. The sky and SN spectra are actually varied in the fit as well, but they are not parameters: for each epoch, given the pointing of the data and the SN position, determining the optimal sky and SN is a linear problem and they are determined on each iteration.

Current situation

Currently, we are using fmin_l_bfgs_b with a gradient function that uses a finite-differences method. It is probably possible to provide an exact calculation of the gradient but the calculation seems quite complex and error prone so we've been attempting to avoid it. Calculating the gradient ourselves using FD may seem a bit odd - why not just let an optimizer do it, as it would also use FD? The reason is simply speed - we know that many parameters only affect a small part of the χ² sum, so we can do the FD more efficiently than a naive black box. See the function fit_position_sky_sn_multi for the implementation.

However, this approach seems to be yielding sub-optimal results so far. Here's the model and residuals for some of the epochs of one SN:

screenshot from 2015-07-05 09 50 53

Note the strong dipole in the galaxy residuals. The optimizer seems to be doing an OK job positioning the data with respect to the SN (modulo ringing due to imprecise PSF model), but a poor job positioning the SN with respect to the galaxy. However, I note that it is a bit difficult to tell how much better it could do: the SN-galaxy offset is obviously the same between all epochs, but not all epochs show the same dipole on the galaxy. I've only shown the worst ones here.

Alternative optimizers

I tried a couple gradient-free methods, just to see how they fared: fmin from scipy.optimize (which implements a downhill simplex algorithm) and minuit, using iminuit. These both ran for 30 minutes before I gave up. That code is in my branch position-iminuit.
Thinking that the optimizer would benefit from knowing that the Hessian is pretty sparse, I tried the scipy.optimize.fmin_ncg() function, which is the only optimizer that uses a user-provided Hessian function. As with the gradient, I calculated the Hessian using FD. This did better, but the FD seems unstable with respect to step size and I never got the optimization to fully converge. It seemed close to convergence, but the parameter that controls tolerance in the optimizer avextol is inscrutable. Plus it still was taking too long (~30 min). The code is in my branch position-ncg-hessian.
Next, I'm going to try a sampling method mainly to attempt to get a better feel for the problem and parameter covariance. It may be that running a 4-parameter sample for each epoch in turn is a feasible approach.

snfactory / cubefit