jobovy / apogee

Tools for dealing with APOGEE data
BSD 3-Clause "New" or "Revised" License
43 stars 25 forks source link

APOGEE selection function not working for 268 fields #60

Open wylie22s opened 4 years ago

wylie22s commented 4 years ago

Hi,

I am trying to determine the selection function in each APOGEE field and the stars to which I can apply its inverse to using the following code:

appath.change_dr(dr='16') allStar= apread.allStar(rmcommissioning=True,main=False,ak=True, akvers='targ',use_astroNN_distances=True) allStar_cut = allStar[some_cuts] apo=apsel.apogeeCombinedSelect(year= 7) statIndx= apo.determine_statistical(allStar_cut)

When I run this, I get the following error:


UnboundLocalError Traceback (most recent call last)

in () 1 #Now which part of the sample is statistical? ----> 2 statIndx= apo.determine_statistical(allStar_cut) ~/.local/lib/python3.6/site-packages/apogee-1.-py3.6.egg/apogee/select/apogeeSelect.py in determine_statistical(self, specdata) 2851 avisitsplate= int(allVisit['PLATE'][indx][0]) 2852 #Find the design corresponding to this plate -> 2853 tplatesIndx= (platelist == avisitsplate) 2854 if numpy.sum(tplatesIndx) == 0.: 2855 plateIncomplete+= 1 UnboundLocalError: local variable 'platelist' referenced before assignment if allStar_cut includes any stars in these fields: [2034 2048 2049 2050 2051 2052 2070 2074 2079 2082 2091 2111 2112 2113 2115 2117 2138 2141 2160 2162 2163 2164 2168 2172 2179 2185 2198 2204 2205 2207 2218 2221 2225 2227 2232 2242 2256 2257 2284 2285 2286 2287 2288 2290 2291 2292 2313 2317 2319 2320 2321 2322 2326 2327 2331 2333 2334 2335 2336 2340 2343 2344 2345 2347 2348 2349 2350 2351 2352 2353 2354 2355 2359 2361 2363 2364 2365 2366 2368 2370 2375 2376 2377 2379 2381 2383 2384 2388 2392 2397 2400 2418 2419 2422 2423 2424 2427 2428 2435 2438 2439 2441 2442 2443 2453 2458 2462 2479 2480 2483 2484 2485 2486 2490 4104 4127 4213 4223 4224 4236 4299 4305 4336 4341 4342 4343 4344 4348 4349 4350 4375 4379 4433 4473 5067 5092 5094 5109 5110 5115 5119 5121 5122 5123 5124 5125 5126 5165 5167 5178 5179 5180 5181 5182 5183 5184 5185 5186 5199 5200 5210 5211 5214 5225 5228 5232 5240 5243 5246 5247 5253 5271 5274 5276 5292 5303 5306 5317 5324 5330 5331 5332 5336 5341 5351 5355 5357 5369 5372 5373 5446 5469 5479 5481 5483 5484 5485 5489 5490 5496 5497 5501 5503 5511 5512 5515 5521 5523 5524 5525 5526 5532 5533 5536 5545 5550 5552 5553 5554 5555 5556 5557 5558 5603 5608 5626 5640 5642 5643 5650 5653 5657 5662 5663 5664 5665 5670 5672 5673 5674 5675 5682 5683 5685 5686 5687 5702 5703 5704 5705 5708 5710 5712 5714 5715 5722 5723 5724 5725 5731 5732 5736 5748 5749 5751 5757 5763 5802] Looking at apogeeSelect.py I think this error occurs because none of these fields are in the arrays _apo1_locations, _apo2N_locations or _apo2S_locations. Is there anything I can do to include the stars in these fields or can the selection function of stars in these fields simply not be calculated? I also find that if I run apogeeCombinedSelect and specify any of the fields listed above in the locations option I get the following error: --------------------------------------------------------------------------- OSError Traceback (most recent call last) in () 4 apo= pickle.load(savefile) 5 else: ----> 6 apo= apsel.apogeeCombinedSelect(locations=locs[1:]) 7 # apo= apsel.apogeeCombinedSelect() 8 save_pickles(savename,apo) ~/.local/lib/python3.6/site-packages/apogee-1.-py3.6.egg/apogee/select/apogeeSelect.py in __init__(self, sample, store_individual, locations, year, mjd, sftype, minnspec, frac4complete, _justprocessobslog) 2226 ap2_locations= None 2227 #load an APOGEE 1 and 2 selection function -> 2228 apo1sel = apogee1Select(year=self.apo1year, mjd=mjd, sample=sample, locations=ap1_locations, _justprocessobslog=_justprocessobslog) 2229 #add dummy color bin info to apo1sel... 2230 apo1sel._number_of_bins = numpy.ones(len(apo1sel._locations)) ~/.local/lib/python3.6/site-packages/apogee-1.-py3.6.egg/apogee/select/apogeeSelect.py in __init__(self, sample, locations, year, mjd, sftype, minnspec, frac4complete, _dontcutcolorplates, _justprocessobslog, hemisphere) 739 self._process_obslog(locations=locations,year=year, 740 frac4complete=frac4complete, --> 741 dontcutcolorplates=_dontcutcolorplates, hemisphere=hemisphere) 742 sys.stdout.write('\r'+_ERASESTR+'\r') 743 sys.stdout.flush() ~/.local/lib/python3.6/site-packages/apogee-1.-py3.6.egg/apogee/select/apogeeSelect.py in _process_obslog(self, locations, year, frac4complete, dontcutcolorplates, hemisphere) 1328 pindx= apogeePlate['LOCATION_ID'] == self._locations[ii] 1329 if numpy.sum(pindx) == 0: -> 1330 raise IOError("No entry found in apogeePlate for location %i" % (self._locations[ii])) 1331 #Remove designs with negative short cohort numbers and unobserved 1332 #plates OSError: No entry found in apogeePlate for location 4341 I guess whatever is causing the first error also causes this one. Lastly, for the fields that work, I get very high inverse selection function values for a few stars (~1000 -> ~10,000). I just want to check, is this expected in some fields or if it could be a zero point error? For example I find 3 stars with inverse selection functions of 10,000. Thanks so much!
jobovy commented 4 years ago

Hi,

The best person to answer most of these questions would be @jmackereth, but I believe he is off until next week, so I can try to help a little in the mean time.

First of all, it's not that unexpected that you would get very large inverse selection function values, because stars are selected in fields with high stellar density, so the sampling fraction is low. This is why we normally only apply the selection function to models (because then observed = SF x model = small for those fields), because inverse-weighting becomes unstable for such parts of the survey.

I only checked a few of the fields that you listed as being problematic, but all the ones I checked are legitimately excluded from the selection function / statistical sample. Some are halo fields selection where all cohorts are selected using additional Washington photometry to distinguish dwarfs from giants (we can't easily get the selection function for those) and some are external programs that are not part of the main survey. So it seems likely to me that all of the fields you listed have such reasons to be excluded.

But I do think it's a bug that determine_statistical fails when you have those fields in the sample. I wonder how @jmackereth got around that. This should be an easy fix by just catching this situation and excluding the star from the statistical sample.

Yes, I think the 2nd issue (specifying locations that have one of these in them) is a similar issue; I believe the code assumes that the location is in the statistical sample when you pass it to the selection function initialization.

So unless @jmackereth says otherwise, I would just proceed by removing stars in those fields from your sample if you want to use the selection function for them.

wylie22s commented 4 years ago

Okay, that makes sense. Thanks a lot!

(I'll leave the question open for a few more days in case @jmackereth has something to add. Otherwise I'll close it because I think you have answered my questions.)

jmackereth commented 4 years ago

Hi! Apologies for my slow response on this! Thanks to @jobovy for stepping in. I agree with his suggestions, but this seems like a bug somewhere that we should fix.

I am currently trying to reproduce your error, but I cant seem to do so just yet (which indicates a more complex issue, I think!). To test the problem, I have evaluated the selection function using apo = apsel.apogeeCombinedSelect(year=7) as in your post.

I then save this using

savename = 'apodr16_csf_23012020.dat'
del apo._specdata, apo._photdata
with open('../sav/apogeecombinedselectionfunction.dat', 'wb') as f:
    pickle.dump(apo, f)

I am then loading a completely raw allStar file using

allstar = apread.allStar()

then using determine_statistical like:

apo.determine_statistical(allstar)

As the raw allStar file should contain a star from every observed field, this should fail with your error if it really is due to a bug when fields not in the selection function are passed to determine_statistical. I am testing a few more things related to this, but in the meantime, one suggestion would be to try:

let me know if you turn up anything new - ill come back with more ASAP! Sorry about this!

jobovy commented 4 years ago

@jmackereth -- Note that allstar = apread.allStar() does not read a completely raw allStar file, but does some cuts, which may remove some of the problematic fields (e.g., it removes commissioning fields). To get the raw allStar file, do

allstar = apread.allStar(raw=True)
jmackereth commented 4 years ago

ah - of course! whoops! I'll re-do the test and see what happens...

wylie22s commented 4 years ago

Hi!

Turns out I had the RESULTS_VERS set wrong. I've changed it to 'l33' and am rerunning everything now. I'll let you know if this fixes the problem or not.

Thanks!

jmackereth commented 4 years ago

Ah! That makes some sense, this is a common issue (I must have done this at least twice!).

Just for info - I did a check with raw=True turned on in the allStar file used, and the code does indeed fail (but in a slightly different way to yours above). My error message is:

Warning: no visit in combined spectrum found for data point apogee.apo1m.s.stars.calibration.VESTA
r12-56398-VESTA.fits
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-8-c39ada068ede> in <module>
----> 1 apo.determine_statistical(allstar)

~/opt/anaconda3/lib/python3.7/site-packages/apogee-1.-py3.7.egg/apogee/select/apogeeSelect.py in determine_statistical(self, specdata)
   2849                 print(avisit)
   2850                 indx= visits == avisit
-> 2851             avisitsplate= int(allVisit['PLATE'][indx][0])
   2852             #Find the design corresponding to this plate
   2853             tplatesIndx= (platelist == avisitsplate)

IndexError: index 0 is out of bounds for axis 0 with size 0

which happens because these plates definitely aren't in the selection function (and potentially aren't in the relevant files?).

In light of this, im going to implement something to try and catch this more gracefully and just exclude these stars from the statistical sample. Thanks again for flagging this up, very useful!

jobovy commented 4 years ago

@wylie22s -- do let us know if the RESULTS_VERS was the issue, change_dr should take care of that, so it shouldn't matter what you set RESULTS_VERS to.

wylie22s commented 4 years ago

Hi, So unfortunately, I still get the error if I use any of the fields I listed above in allStar_cut . I also tried uninstalling and reinstalling the package, but that did not help.

I find that if I run the following:

apo = apsel.apogeeCombinedSelect(year=7)
x = apo(f, 12, 0.6)

where I replace f with any of the fields I listed above I get the error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-28-cf70d5e05aaf> in <module>()
----> 1 x = apo(f[0], 12, 0.6)

~/anaconda3/lib/python3.6/site-packages/apogee-1.-py3.6.egg/apogee/select/apogeeSelect.py in __call__(self, location, H, JK0)
   2391         out= numpy.zeros(len(H))
   2392         #see which bins the stars are in, first work out the bins and the limits
-> 2393         nbins = self._number_of_bins[locIndx][0]
   2394         lowjk = self._color_bins_jkmin[locIndx][0]
   2395         bins = lowjk[:int(nbins+1)]

IndexError: index 0 is out of bounds for axis 0 with size 

In this case f[0] was field 2034.

Does this occur for you as well?

jmackereth commented 4 years ago

Ah, thanks for this. Seems like this is a bigger issue somewhere then.

Ok, Yes, I get that error, so this is probably not installation (apologies for making you go down that rabbit hole!). This is certainly because these fields are not included in the selection function (I have checked this in detail now).

Removing them should be ok, for now. However, they should be caught with no error and maybe just a warning. Similarly, if you try to evaluate a selection function using only bad fields, this should give a more reasonable error message. I'll try and implement this today, and get back to you ASAP!

wylie22s commented 4 years ago

No worries, thanks!

jmackereth commented 4 years ago

Hi again,

Ok, so I added a few quick fixes to this issue into the ap2sf branch of my fork of this module. It would be brilliant if you could go and test these by cloning my fork from here, then doing git checkout ap2sf to switch to the right branch before installing.

The changes are the following:

I also ran all of your locations in your initial post through the new version (using the locations= keyword) and the one which _are_in apogeePlate all have 0. completion in all cohorts, which is why they are removed when you use apogeeCombinedSelect with no locations in the input.

Hope this improves the situation, please do let us know if you uncover any further issues while testing!

T

wylie22s commented 4 years ago

Hi!

I did what you said and then ran the following code:

apo = apsel.apogeeCombinedSelect(year=7)
print(apo(2034, 12, 0.6))

When I run this, I get the following:

/Users/swylie/anaconda3/lib/python3.6/site-packages/apogee-1.-py3.6.egg/apogee/select/apogeeSelect.py:2380: UserWarning: No matching location in this selection function
  warnings.warn("No matching location in this selection function")

Out[22]:
0.0

So, that works. However, if I then run:

allStar= apread.allStar(rmcommissioning=True, main=False, ak=True, akvers='targ', use_astroNN_distances=True)
loc = allStar[allStar["LOCATION_ID"]==2034]
statIndx= apo.determine_statistical(loc)

I get the error:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-28-8db9b8d62e3b> in <module>()
----> 1 statIndx= apo.determine_statistical(loc)
      2 

~/anaconda3/lib/python3.6/site-packages/apogee-1.-py3.6.egg/apogee/select/apogeeSelect.py in determine_statistical(self, specdata)
   2880             avisitsplate= int(allVisit['PLATE'][indx][0])
   2881             #Find the design corresponding to this plate
-> 2882             tplatesIndx= (platelist == avisitsplate)
   2883             if numpy.sum(tplatesIndx) == 0.:
   2884                 plateIncomplete+= 1

UnboundLocalError: local variable 'platelist' referenced before assignment

So, I think something is still not quite right with the determine_statistical function.

jmackereth commented 4 years ago

Hello!

Apologies for the delay again! I think I just put in a fix that should correct that bug. The issue was in an if loop that determined which survey the star was in (e.g. APOGEE-1, 2 N or S). I included an extra clause that should catch this... if you re-pull the ap2sf branch from my fork, you should be able to test this.

Fingers crossed!

Ted

jmackereth commented 4 years ago

Hi - Just following up on this, happy to help if there are still issues. Just let me know!

wylie22s commented 4 years ago

Hi! So sorry, I got distracted by some other work. I will rerun everything today and let you know how it works.