henrysky / astroNN

Deep Learning for Astronomers with Keras
http://astronn.readthedocs.io/
MIT License
193 stars 51 forks source link

DR16 astroNN catalog of distances produces incorrect parsec values for Md and Mg stars #16

Closed JosephKarpinski closed 3 years ago

JosephKarpinski commented 3 years ago

System information

Describe the problem

astroNN Gaia DR2 parallax zero-point offset with deep learning

Gaia DR2 calculates it as −0.029 mas. Sloan Digital Sky Survey Apogee calculates it as −0.0523 mas. Modified parallax = parallax - zero point offset Data model: apogee_astroNN provides spectro-photometric deep learning parsec distances. Distance in parsecs to the Orion Nebula for star classes BA, Fd, GKd and GKg pretty much agree. But astroNN appears to produce 4-5 times larger distances for Md and Mg stars.

Parsecs calculated with parallax zero point offset options: Parsec- no offset Dist - Apogee Deep Learning DistApogee - use Apogee offset DistGaia - use Gaia offset

11D510AD-2511-47D8-B70A-EC2785E3D07C D31ABBE0-7658-41D5-ACD2-43AD2ED29C37

C2C2D6FE-2466-4B09-9060-F3428E2C73F3 D2886AD6-A25C-43BD-AEDF-88B3D20A9D4D 9437CA6F-0CE7-4C21-B745-30F431C435F0

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Suggestion

Optional, if you have any idea how to fix the issue

JosephKarpinski commented 3 years ago

Here the Md star parsec distances calculated by Deep Learning appear to be consistently off across multiple Sloan Digital Sky Survey Apogee DR16 star fields

8E164852-4C5F-48BA-92C6-5D0982A7120B 4DD46698-2772-41AE-AB53-AB98B4726E21 69DC4316-121E-4A77-9F88-B1C70B05184E A70F38B1-6E7B-4783-AEE4-4A2786B8988A
henrysky commented 3 years ago

Hi @JosephKarpinski , thanks for reporting the issue with all the detail.

The reason why NN distance is very wrong for Md stars is because they are not really many of them in our training set due to cuts as we focus mostly on giants. You can check dist_error to check how certain we are on NN dist. Moreover we recommend to cut out all the stars where NN logg has more than 0.2dex uncertainty (i.e. logg_err which Md stars generally have almost 0.4dex uncertainty on logg from NN model)

When plotting the Md stars in Orion, the stars piling up at ~400pc for Gaia parallax because Gaia parallax are good at such short distance while NN distance are everywhere and errorbar is huge. image

JosephKarpinski commented 3 years ago

Well, here’s the breakdown of stars within Apogee DR16. I’m expecting we will see possibly double those counts in Apogee DR17, December 2021

Sent from my iPad

On Oct 28, 2021, at 2:46 PM, Henry Leung @.***> wrote:

 Hi @JosephKarpinski , thanks for reporting the issue with all the detail.

The reason why NN distance is very wrong for Md stars is because they are not really many of them in our training set due to cuts as we focus mostly on giants. You can check dist_error to check how certain we are on NN dist. Moreover we recommend to cut out all the stars where NN logg has more than 0.2dex uncertainty (i.e. logg_err which Md stars generally have almost 0.4dex uncertainty on logg from NN model)

When plotting the Md stars in Orion, the stars piling up at ~400pc for Gaia parallax because Gaia parallax are good at such short distance while NN distance are everywhere and errorbar is huge.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

JosephKarpinski commented 3 years ago

Here’s the thing. Are the AstroNN distance values questionable for all Apogee dwarf stars, given it’s focus on giants? The large sample of GKd stars? Not sure how this would impact any AstroNN dwarf generated metrics. Looking more into GKd impact …

4C61C1F3-62C2-452E-9527-A8E75DAAE3F9
henrysky commented 3 years ago

Yes astroNN distances of dwarfs are generally questionable especially for stars with logg uncertainty>0.2dex. The focus of giants is because our goal to map the milkyway at a large distance and since this neural network works by predicting the luminosity of stars and we have approx 7% typical uncertainty in luminosity will be translated into approx 7% distance uncertainty, neural network that predicts luminosity (thus distance with apparent magnitude) probably can never outperform Gaia which uses geometric parallax at such a close distance.

Considering the target selection of APOGEE which dwarfs wont be selected at a far distance because they will be too dim to be selected (thus only giants are selected at a great distance if your goal is to map the MillkyWay in large volume anyway), I would always recommend to use Gaia parallax to get the distance to dwarfs in APOGEE even if astroNN produces reasonable distance to dwarfs, since Gaia geometric parallax will always be much better for them.

JosephKarpinski commented 3 years ago

Thank you.

I’ll follow your suggestion when looking at closer targets. It will be interesting to see if Apogee DR17 uses Gaia EDR3 data and how that improves distances values beyond 2k parsecs.

Let’s close the issue.

Best Regards,

Joseph Karpinski

Sent from my iPad

On Oct 30, 2021, at 12:48 AM, Henry Leung @.***> wrote:

 Yes astroNN distances of dwarfs are generally questionable especially for stars with logg uncertainty>0.2dex. The focus of giants is because our goal to map the milkyway at a large distance and since this neural network works by predicting the luminosity of stars and we have approx 7% typical uncertainty in luminosity will be translated into approx 7% distance uncertainty, neural network that predicts luminosity (thus distance with apparent magnitude) probably can never outperform Gaia which uses geometric parallax at such a close distance.

Considering the target selection of APOGEE which dwarfs wont be selected at a far distance because they will be too dim to be selected (thus only giants are selected at a great distance if your goal is to map the MillkyWay in large volume anyway), I would always recommend to use Gaia parallax to get the distance to dwarfs in APOGEE even if astroNN produces reasonable distance to dwarfs, since Gaia geometric parallax will always be much better for them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

JosephKarpinski commented 3 years ago

Hi,

Focused on Sloan Digital Sky Survey Apogee DR16 star fields with large numbers of GKg stars. AstroNN “dist” values look closer to Gaia DR2 “Parsec” values at 2K parsecs.

Sent from my iPad

On Oct 30, 2021, at 12:48 AM, Henry Leung @.***> wrote:

 Yes astroNN distances of dwarfs are generally questionable especially for stars with logg uncertainty>0.2dex. The focus of giants is because our goal to map the milkyway at a large distance and since this neural network works by predicting the luminosity of stars and we have approx 7% typical uncertainty in luminosity will be translated into approx 7% distance uncertainty, neural network that predicts luminosity (thus distance with apparent magnitude) probably can never outperform Gaia which uses geometric parallax at such a close distance.

Considering the target selection of APOGEE which dwarfs wont be selected at a far distance because they will be too dim to be selected (thus only giants are selected at a great distance if your goal is to map the MillkyWay in large volume anyway), I would always recommend to use Gaia parallax to get the distance to dwarfs in APOGEE even if astroNN produces reasonable distance to dwarfs, since Gaia geometric parallax will always be much better for them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

henrysky commented 3 years ago

Yes APOGEE DR17 will be using Gaia eDR3 and eDR3 parallax does improve quite a lot but neural network distance is still much better beyond a few kpc.

If you want Gaia eDR3 parallax with APOGEE DR16, you can use my script here to generate Gaia eDR3 data file row-matched to APOGEE allstar file: https://github.com/henrysky/astroNN_APOGEE_VAC/blob/master/2_gaia_xmatch.py