gully / blase

Interpretable Machine Learning for astronomical spectroscopy in PyTorch and JAX
https://blase.readthedocs.io
MIT License
24 stars 6 forks source link

Migrate telluric module towards the sparse implementation #13

Closed gully closed 2 years ago

gully commented 2 years ago

We have some awesome existing source code for the telluric module. The code takes molecular data from the Hitran Python Interface (HAPI) and produces spectra.

That code goes through the trouble of cascading the transmission through a multi-layer atmosphere. While that's nice and accurate and all, it's a bit of a computational waste, since for 100 layers, we have to do 100x the amount of computation. In the end the spectral lines still mostly look like Voigt profiles. So what we should do instead is emulate a precomputed synthetic model spectrum of telluric transmission from a high-resolution, high accuracy model such as TelFit. We proceed the same way we currently do for emulating precomputed synthetic spectra of stars: we will set a threshold and use scipy find-peaks to find all the prominent telluric absorption lines.

We will introduce one augmentation at that stage: rather than instantiate Voigts with unknown sigmas and gammas, we can initialize the amplitude, sigma and gamma with values that a typical temperature/humidity/pressure-level would dictate. That might be more trouble and less accurate than it's worth though? So there are some tradeoffs in the initialization to explore.

In either case, the training will proceed in the same way as we're used to. One good thing is that our wingcut can be much smaller than in the stellar case. We do not have to worry about RV and vsini, so the influence of a given telluric line stays very close to its few tens of pixels. On the downside, the sampling of the telluric model is much finer than the stellar model. So we will have a very tall-and-thin sparse 2D matrix for telluric models and---when coalesced---it will yield a high resolution, high-bandwidth 1D spectrum.

Once the telluric model parameters are pretrained on the TelFit model, they can be combined with the stellar model. This introduces another question: how do we want to handle the sampling? Should we evaluate the stellar model on the same wavelength coordinate grid as the telluric model? That would makes the stellar model match the finer sampling of the telluric model, blooming the computation time. Ideally we would want to get away with the coarser stellar sampling, but some telluric lines may be too narrow to accurately sample with "merely" 0.01 Angstrom resolution. We'll just have to explore! The cool thing is that I think we can stick with a small wingcut for the telluric model even through the data-application phase.

gully commented 2 years ago

This issue is much easier to overcome now. We can simply populate a state_dict-like dictionary with the voigt properties, and use that for initialization.

So we may want a method to_init_dict() (or equivalent name) in the old-school telluric code that outputs this dictionary.

gully commented 2 years ago

Completed! Yay!