Closed kbarbary closed 9 years ago
I'm now leaning towards thinking that the current optimizer fmin_l_bfgs_b
is doing a pretty good job. I ran separate 4-parameter samplers on each of the epochs in turn. The parameters in each epoch are x
, y
(position of data w.r.t. galaxy) and snx
, sny
(position of SN w.r.t. galaxy). Note that snx
, sny
parameters are (in reality) shared between epochs, but this sampling doesn't know that: each epoch is free to have its preferred SN position. Here's an example of the results for one epoch:
The other epochs look similar. The snx
and sny
often don't agree between epochs. But qualitatively, the results with fmin_l_bfgs_b
seem to be a pretty good compromise between the epochs' preferred snx
and sny
. Further, given that SN position, the data position fit x
, y
seem to agree with the peak of the conditional posterior from the samples.
Background
The final step of CubeFit is, having finished determining the galaxy model (based on final references), to fit the pointing of each epoch with SN light as well as the position of the SN relative to the galaxy model. The galaxy model remains fixed in this process. This is a
2 * nepochs + 2
parameter fit where nepochs is the number of epochs with SN light. The sky and SN spectra are actually varied in the fit as well, but they are not parameters: for each epoch, given the pointing of the data and the SN position, determining the optimal sky and SN is a linear problem and they are determined on each iteration.Current situation
Currently, we are using
fmin_l_bfgs_b
with a gradient function that uses a finite-differences method. It is probably possible to provide an exact calculation of the gradient but the calculation seems quite complex and error prone so we've been attempting to avoid it. Calculating the gradient ourselves using FD may seem a bit odd - why not just let an optimizer do it, as it would also use FD? The reason is simply speed - we know that many parameters only affect a small part of the χ² sum, so we can do the FD more efficiently than a naive black box. See the functionfit_position_sky_sn_multi
for the implementation.However, this approach seems to be yielding sub-optimal results so far. Here's the model and residuals for some of the epochs of one SN:
Note the strong dipole in the galaxy residuals. The optimizer seems to be doing an OK job positioning the data with respect to the SN (modulo ringing due to imprecise PSF model), but a poor job positioning the SN with respect to the galaxy. However, I note that it is a bit difficult to tell how much better it could do: the SN-galaxy offset is obviously the same between all epochs, but not all epochs show the same dipole on the galaxy. I've only shown the worst ones here.
Alternative optimizers
fmin
from scipy.optimize (which implements a downhill simplex algorithm) and minuit, using iminuit. These both ran for 30 minutes before I gave up. That code is in my branch position-iminuit.scipy.optimize.fmin_ncg()
function, which is the only optimizer that uses a user-provided Hessian function. As with the gradient, I calculated the Hessian using FD. This did better, but the FD seems unstable with respect to step size and I never got the optimization to fully converge. It seemed close to convergence, but the parameter that controls tolerance in the optimizeravextol
is inscrutable. Plus it still was taking too long (~30 min). The code is in my branch position-ncg-hessian.