Closed gully closed 3 years ago
Ok, this works as of 1b2298e
I implemented sky line sparsity as an initialization option dense_sky=False
, which toggles between a sky model of ~1400 sub-pixel sky lines, and a sparse sky of a few to tens of labeled sky lines. I added a sky continuum function to compensate for the loss of sky emission that was filled in generously with the dense sky model; for now sky continuum is a fourth order polynomial (implemented as Chebyshev polynomials) in wavelength.
When running a full training run on 14000 epochs, an undesirable phenomenon occurs. The sky traces---which have ~1400 parameters---have the most flexibility to deal with the sharp decrease in flux on the red side of the detector, so they bloat in width at the expense of underfitting the sky-trace entirely. Here are a few strategies that would fix this overzealous fitting, and I think we can do all of them:
A few other issues:
[x] Making the sky continuum a fourth-order polynomial in wavelength complicates the backpropagation. We now have to retain the graph with loss.backward(retain_graph=True)
. Since the function is so smooth, we could just as easily get away with making sky continuum a function of say x
, which I think would simplify and maybe speed-up the back-prop.
[x] We should also sample sky-line amplitudes in log space.
[ ] We've made the __init__
to Echellogram long and complicated with the handling of the different sky options. We should consider breaking that handling into initialization functions.
Here's a Tensorboard screenshot illustrating the bloating over-fitting phenomenon described above.
Making sky continuum a function of x had little impact on GPU training performance, but did seem to use about 20% less memory. I'd say let's keep it in wavelength, since that is closer to physical reality and more interpretable. Instead I simply place a torch.no_grad()
context manager before we make the Chebyshev Polynomial array. Essentially what this says is that the minüte changes to wavelength calibration will have negligible effect on the smooth polynomial function, so don't even bother using memory to track those changes in the backprop.
Sampling in log space dramatically improves the speed of training since it can access a much higher dynamic range in a fixed amount of time.
The spectral traces now bloat to over-fill the entire slit length in order to fit the sky lines, even after masking the dark right-edge region, which also makes sense from a loss function perspective. I think the solution to this and other problems is to use a better sky model (new issue #7) combined with a GP for imperfections in that sky model (issue #3). The basic functionality is in place, so let's close this task.
Each known night sky line can inform the wavelength calibration process. We need to implement the reading-in and filtering of these sky lines. Most orders will only have a few, which is fine. The weakest sky lines can be safely ignored, assuming a GP in wavelength with identify them.