gully / blase

Interpretable Machine Learning for astronomical spectroscopy in PyTorch and JAX
https://blase.readthedocs.io
MIT License
26 stars 7 forks source link

Calibrating Teff, logg, and metallicity #44

Closed gully closed 2 years ago

gully commented 2 years ago

We should mention the prospect of calibrating Teff, logg, and metalicity. There are many ways to achieve that goal.

One is to apply Machine Learning on the outputs of blase. Basically, run blase cloning on the entire grid, do some cluster finding or correlation finding on the lines, and then calibrate the models en masse based on piles of spectra. This strategy may amount to fancy equivalent widths, but 1) it's systematic! and 2) it leverages the whole dataset!

There are fancier strategies including hierarchical models for an ensemble of similar spectra. But that's much more complicated to fit into a blurb in a discussion. So conveying the concept is key.

gully commented 2 years ago

There was a lot of interest about this theme at CoolStars21. Marina Kounkel asked a question at my talk, Gent, Bello, and others had talks on this. Passegger et al. 2020 labels the train/test performance degradation from synthetic to real data as the "synthetic gap". Here are some thoughts abridged from an email:

Currently the blasé tool does not infer a Teff and logg. It simply warps the closest input template into a “semi-empirical template”. This step on its own cannot give you Teff and logg. But I see a way to use these semi-empirical spectra to bridge the “synthetic gap”.

In principle, an ensemble of spectra (say ~10+ spectra, but ideally hundreds) warped with blasé could begin to inform permanent semi-empirical changes to the grid.

In practice, this is tricky. There is no guarantee of smoothness, there’s label noise, and other problems that make this much harder in reality. I think one of the best ways forward is just making better models. Aishwarya Iyer had a poster (and a new paper) showing that PHOENIX underperforms compared to her new models mostly just due to modern molecular line lists. So some problems will be fixed just by better line lists. That will lower the "synthetic gap", for “free” (or cheap).

Coincidentally, the question by Marina Kounkel after my talk was getting at the same theme---how to use the blasé outputs to learn about Teff, logg. In short, I would recommend Starfish (Czekala et al. 2015) for this task right now. The better the models get, the better Starfish will perform, too.

gully commented 2 years ago

Some text is in the paper now, yay! We should cite the "Synthetic Gap" vocabulary per the comment above from the CS21 ML talk by Bello.