Main sample determination

wylie22s commented 3 years ago

Hi,

I load the apogee catalog using the following function:

allStar= apogee.tools.read.allStar(rmcommissioning=True, main=True, ak=True, akvers='targ', use_astroNN_distances=True, rmdups=True, use_astroNN_ages=True)

If I understand correctly, by setting main=True, allStar will only contain main sample stars. However, if I then run the following code:

np.unique(allStar['EXTRATARG'], return_counts=True)

I get:

(array([ 0,  1,  4,  5, 16, 17, 21], dtype=int16), array([265851,  75511,      5,  15668,     82,     28,     22]))

https://www.sdss.org/dr14/algorithms/bitmasks/#EXTRATARG states that main survey targets have EXTRATARG==0. Is there a difference between what you define as the main sample and what SDSS does?

Thanks! Shola

jobovy commented 3 years ago

Hi,

Yes, the definition of main is not the same; I'm surprised that there are so many stars with the commissioning extratarg bit set, but perhaps those are from APOGEE-2 (rmcommissioning=True only removes original APOGEE-1 commissioning stars).

If you just want to load the file with the astroNN distances and ages, but then apply the standard APOGEE definition of main, I think you can do

allStar= apogee.tools.read.allStar(use_astroNN_distances=True,use_astroNN_ages=True,raw=True)
allStar= allStar[allStar['EXTRATARG']==0]

because raw=True just returns the file without doing any cuts.

wylie22s commented 3 years ago

Hi,

So what I would really like to do is calculate the selection function of a set of APOGEE stars and weight them by the inverse to statistically get back the photometric sample. I am confused on which stars I can apply the inverse of this selection function to. I previously I loaded the catalog with Main=False and applied the inverse selection function to stars that satisfied the mask from determine_statistcal. However, do these stars need to be main sample stars according to the function read.mainIndx?

Thanks!

jobovy commented 3 years ago

Yes, the stars in the statistical sample have to be in the main sample as determined by read.mainIndx. But that's done as part of determine_statistical (see the last line in that function), so if you just use determine_statistical, you should be okay.

wylie22s commented 3 years ago

Okay, glad to know that it was correct to apply determine_statistical.

I just wanted to add that now that I understand that MainIndx does not return the main sample as defined by SDSS, I think I also understand a problem I was previously having with the function _determine_selection. I compared the H-band luminosity functions of the APOGEE stars to the H-band luminosity functions of the photometric samples they were selected from in order to check that I was running and applying the selection function code correctly. For most cohorts the luminosity functions roughly matched after the inverse of the weights returned by _determine_selection were applied. However for a few they did not (such as the cohorts in the fields 5391 or 5403). I just checked and many of these cohorts have all stars or many stars (>=30%) with 'EXTRATARG' != 0. I think what might be occurring is that many of the stars with 'EXTRATARG' != 0 have not been sampled randomly and therefore weighting all the stars in a cohort that has many or all stars with 'EXTRATARG' != 0 by a single number does not return the correct frequency for the stars in these cohorts. It's not many fields or stars where this occurs but I thought it might be helpful to point out.

jobovy commented 3 years ago

Hi @wylie22s,

@jmackereth just found a bug in the mainIndx code that meant that APOGEE-2 stars were included regardless of whether they were actually in the main sample. This has now been fixed in #65 and I think that also fixes the issues you were finding. Applying the mainIndx cut to DR16 now, there are 280,769 stars in the main sample with 19,360 of those having EXTRATARG > 0. All but five of these are duplicates and they simply mean that mainIndx resolves duplicates differently than how it was done for EXTRATARG. The remaining five are tellurics; I'm not entirely sure how they end up in the main sample, but it's likely that these are stars in the halo sample where the main color cut is relaxed and some blue stars that are also used as tellurics are picked up. Anyway, it's only five stars! So I think that the main sample as determined by mainIndx and EXTRATARG are now basically equivalent.

wylie22s commented 3 years ago

Hi,

Yes, just checked and this fixed my problem!

Thanks, Shola

On 20. May 2021, at 20:54, Jo Bovy @.***> wrote:

Hi @wylie22s https://github.com/wylie22s,

@jmackereth https://github.com/jmackereth just found a bug in the mainIndx code that meant that APOGEE-2 stars were included regardless of whether they were actually in the main sample. This has now been fixed in #65 https://github.com/jobovy/apogee/pull/65 and I think that also fixes the issues you were finding. Applying the mainIndx cut to DR16 now, there are 280,769 stars in the main sample with 19,360 of those having EXTRATARG > 0. All but five of these are duplicates and they simply mean that mainIndx resolves duplicates differently than how it was done for EXTRATARG. The remaining five are tellurics; I'm not entirely sure how they end up in the main sample, but it's likely that these are stars in the halo sample where the main color cut is relaxed and some blue stars that are also used as tellurics are picked up. Anyway, it's only five stars! So I think that the main sample as determined by mainIndx and EXTRATARG are now basically equivalent.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jobovy/apogee/issues/64#issuecomment-845386840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAMHH5PPN32FRLNZ5FATMDTOVLE7ANCNFSM44YO7ROQ.

jobovy / apogee

Main sample determination #64