rjanish / massivesurvey

MASSIVE survey software
0 stars 0 forks source link

Improve bad_data handling #64

Closed melanieveale closed 8 years ago

melanieveale commented 9 years ago

Right now the bad_data metaspectra is only used when it comes time to mask data in ppxf, but we should use it for more (in particular, the bad data flagged in the original files, to -666e-17 or something, because that in particular has required some hacky adjustments to plotting and such.) Copying Ryan's comments below..

melanieveale commented 9 years ago

The bad_data array does get used in principle quite a bit, but hasn't been in our case because of the initializing to zero. Most of the methods of SpectrumSet use the bad_data to only compute flux, s/n, etc. on the good pixels - this is why it effects the binning.

It also gets used somewhat stupidly in ppxf_fitspectra, which I have been meaning to update but forgot to make an issue for. In my original plan, I distinguished between 'bad data' and 'masked data'. The bad data is junk - it is identified as such in spectrumset and should never be used for anything. pPXFdriver would sensibly read the bad_data of any spectrum given to it, and always set the bad pixels to be masked. But, in addition, I wanted to include in pPXFdriver the possibility of passing either masking regions or particular pixels that would not be included in a fit, even though that data is valid. There are reasons we might do that - say our templates can't match a particular line very well for an understood reason, then we might mask that line out in the fit to get better kinematics even though the data there is valid. I thought it made sense to separate those two types of masking to ensure bad data is masked always and automatically, and to allow some clarity in specifying possible valid-but-masked regions.

But, to save time, I did not do that. All I wrote was the bad_data masking part, and right now ppxf_fitspectra reads the masking regions in the param file and sets the spectra to have bad_data = 1 in those pixels before feeding to pPXFdriver. All we've masked so far is indeed sky lines where the data is bad, so it hasn't been a problem.

Things we definitely want to flag are:

Poorly-subtracted sky lines (particularly the giant one at ~ 5550) Pixels Jenny has flagged... these would only be marked by the data themselves having some absurd value. I don't remember if it is a particular value. Vaccine (the initial reduction program) flags bad pixels by setting their data to be -666, and then Jenny processes that in some fashion. Things to maybe flag:

The edges of the spectrum where interpolation gives junk data. This is the same thing that I have been cropping out in process_miles. Perhaps it makes sense to flag and then crop, so that future people would not see the un-cropped spectra and take them as fully valid? Not quite sure on the logistics of that though. For the negative spectra values, I think we may need to think a bit. I don't think that data is necessarily invalid, as long as the noise estimate is large enough that the data is consistent with zero. Even though a physical spectrum is non-negative in principle, a noisy measurement of a spectrum could be negative. I think the question is whether our analysis code will be able to handle such negative values sensibly. The binning code should - it would average those negative pixels with extra-positive pixels from other fibers and give us a probably positive result. If the result after binning is still negative, then it is still sensible (the spectrum is just still consistent with zero even after all fibers are considered), but now I think it will break the code that comes after binning, like pPXF. So at that stage, we either flag those as bad or maybe even set to zero with some errorbar?