spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
569 stars 167 forks source link

NaNs in the flat variance array contaminate the x1d combined error estimate #7647

Closed stscijgbot-jp closed 5 months ago

stscijgbot-jp commented 1 year ago

Issue JP-3250 was created on JIRA by Alicia Canipe:

From Help Desk ticket INC0189669 (link at the bottom for reference): 

The user noticed the FLUX_ERROR column in an x1d product was an array of NaNs. Howard pointed out that the cal files have all zero values in the FLAT_VARIANCE 2d array, which then propagates through to the x1d files created by the extract_1d step. It appears that the flat variance array is all zero because all of the ERR arrays in the current flat-field reference files for the NIRSpec fixed-slit mode (using the prism) are also zero-filled. 

So two things would need to be updated to address the user's issue:

Find and fix the fact that the NaN values in the flat variance array are contaminating the computation of the combined error estimate, which is composed of Poisson, read noise, and flat variances, so we could then at least get error estimates that represent the Poisson and read noise only. 

The NIRSpec instrument team would need to compute estimated variance data for the flats and store them in the flat-field reference files used by the calibration pipeline. This should probably be captured separately in a CRDS ticket. 

 

Help desk link: https://stsci.service-now.com/nav_to.do?uri=%2Fincident.do%3Fsys_id%3D5d12d758477ea190289c4f32736d4364%26sysparm_stack%3Dincident_list.do%253fsysparm_query%253dactive%253dtrue%26sysparm_view%3DJWST%26sysparm_view_forced%3Dtrue

stscijgbot-jp commented 1 year ago

Comment by Alicia Canipe on JIRA:

From NIRSpec: also answered in Help Desk ticket INC0189910. 

stscijgbot-jp commented 1 year ago

Comment by Alicia Canipe on JIRA:

Howard Bushouse possibly very dumb question: for issue # 1 (flat variance contaminating computation of error estimate), that sounds like it could be a general issue (not just NIRSpec). Is that right? If so, I'll add the other instruments to this ticket. 

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Alicia Canipe  Yes, that probably is a general issue that applies to all instruments. And according to Tyler Pauly the problem may not actually be in the extract_1d step, but rather in the resample/resample_spec step, which is apparently creating resampled ERR arrays that are all zero/NaN for these cases.

stscijgbot-jp commented 1 year ago

Comment by Tyler Pauly on JIRA:

When the resample_spec step drizzles the input variance arrays, it sums them using inverse variance weighting (even if only one exists, the inverse summation still occurs). This inverse summation requires that zero-valued pixels are masked to avoid undefined/NaN values.

In this case, the flux_var_flat array has values of either 0 or NaN, such that the 0 values are masked and resample_spec is left with the array it initializes, which is full of NaNs. I don't see a straightforward way of making modifications to resample_spec behavior, as this is the desired behavior. We can add a check to the quadrature sum of variance arrays in extract_1d that would prevent one NaN array from "corrupting" the flux_error array with NaNs.

For the data referenced in the Help Desk ticket, the other two variance arrays (poisson and read noise) are present and could be summed to find the error calculated from their contributions. I have some minor qualms about letting the flux_error column have a varying definition, depending on if one variance array is NaN or not. It would allow a user to make the mistake of assuming that the error contains contributions from flat field variance, when in fact that column would be NaN for these data.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Fixing/updating extract_1d to disregard NaN's when computing the quadrature sum for the FLUX_ERR data sounds like a reasonable approach, but Tyler Pauly raises a fair question as to whether that's an appropriate thing to do, especially if only some pixels in the variance arrays are NaN. That would result in varying contributions from one wavelength bin to the next in the FLUX_ERR array and the user would have no way of knowing which contributions were included for a given bin. Perhaps we should solicit opinions from folks in DMS WG as to what they think about this approach Alicia Canipe 

stscijgbot-jp commented 1 year ago

Comment by James Muzerolle on JIRA:

We had set the flat_variance arrays in the reference files to zero while debating how to handle the errors in the flats, not realizing this would be an issue. We will redeliver flats with non-zero variance arrays. Because we have three components to the flats that are interrelated, our plan was to keep the D- and S-flat variances at zero and put real uncertainties only in the F-flat files; I think this should work given how the combined var_flat is calculated, but can someone confirm?

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

James Muzerolle Yes, I believe it should work OK to only have non-zero variances in the FFLAT. That's the way it used to be and we did not seem to have the issue downstream when that was the case. And looking at the code in the flat-field step where it combines the ERR arrays from DFLAT, FFLAT, and SFLAT to produce a combined flat-field ERR estimate (which then gets stored in the VAR_FLAT array of the science data), it uses simple summation of the err values from d_flat, f_flat, and s_flat and hence as long as at least one of them has non-zero contributions, we should get a non-zero VAR_FLAT array.  NaN's in the ref file ERR arrays, on the other hand, could be problematic. It looks like the current code is not robust enough to handle NaN's in the input ERRs.

stscijgbot-jp commented 1 year ago

Comment by James Muzerolle on JIRA:

Thanks, Howard Bushouse. Unfortunately, I think there might be a problem with the FFLAT file format. We need to be able to put a 1D error array as part of the FAST_VARIATION table extension. However, it looks like that currently only has four parts -  slit name, nelem, wavelength vector, and "data" vector. I'm guessing that is baked into the data model? In which case, that would need to be changed before we could deliver new files.

stscijgbot-jp commented 1 year ago

Comment by James Muzerolle on JIRA:

Relevant to the above, I just opened a separate ticket to add an error array to the F-flat data model (https://jira.stsci.edu/browse/JP-3304).

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Tony Keyes James Muzerolle Peter Zeidler Kayli Glidic The PR #7711 has been created, which includes a workaround for this issue, by modifying the way the extract_1d step computes the total ERR values from the constituent variance arrays. Right now it just blindly sums them all together, which means that if any of the components have NaN's in them, the ERR ends up with NaN's. The PR modifies that behavior such that if any of the variance arrays has even just one NaN value in it somewhere, it excludes that entire variance array from the computation of the combined ERR values. That puts a bandaid on the situation, such that we no longer have ERR arrays full of NaN, but the question is whether that's really the right thing to do or not. The issue is that there's no obvious way for a user to know that their ERR values were computed from only some of the variances and not all of them, so they don't know that the ERR's are not really representative of their data.

So perhaps it's better to just leave things the way they are right now, because at least when a user sees ERR values that are all NaN, they know that something's not right (they just may not understand exactly what isn't right). The ultimate solution of course is to finish the work being done to add ERR values to the NRS FFLAT ref file, so that we don't get into this situation in the first place.

Comments?

stscijgbot-jp commented 1 year ago

Comment by Melanie Clarke on JIRA:

In my opinion, the best approach would be a simple nansum across the variance components, rather than ignoring any component that has any NaNs at all.  There are valid scientific use cases for NaN values in error arrays, and they should be handled properly without invalidating the whole error array. 

Assigning meaningful error values should be handled in the reference files; communicating to users what they mean should be handled in documentation.  The software should just sensibly propagate the inputs it's given.

stscijgbot-jp commented 1 year ago

Comment by Peter Zeidler on JIRA:

I agree with Melanie Clarke 

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Regarding the proposal to only drop individual variance elements where a NaN occurs, rather than dropping an entire variance array when it contains any NaNs, I'm torn between which approach would be the lesser evil. If one of the three variances happens to dominate the relative contribution to the total ERR, and that variance has one or more NaN elements, then the total ERR for those elements is going to be very different than its neighbors and severely underestimated. How will a user know that particular ERR elements are not to be trusted?

Of course the real solution is to simply make sure that all variance arrays contain some reasonable non-NaN value, even if they have to be interpolated across flaky pixels (for example). But that only needs to happen once, when ref files are created.

stscijgbot-jp commented 1 year ago

Comment by Melanie Clarke on JIRA:

Yes, I agree that making sure all pixels have sensible values is a reference file task.  I don't expect that there will be scattered NaN values in the reference files in science areas, but I imagine it might be useful for regions of the array to be set to NaN, if they are outside the usable area for a particular mode, for example.  If the software allows the error array to contain at least some NaNs, the science teams can make that decision for their own reference files.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Melanie Clarke that makes sense.

stscijgbot-jp commented 1 year ago

Comment by Michael Regan on JIRA:

Can't we just set some small error value for all pixels until we get the actual errors?

stscijgbot-jp commented 1 year ago

Comment by Melanie Clarke on JIRA:

Michael Regan - Yes, that's possible, but there are three reasons why that's not the preferred solution for NIRSpec:

Reference file re-creation and delivery for all existing flats is a significant effort. We prefer to do it once, when the actual errors are known.

0 or NaN for the error value is a more accurate scientific representation of the current situation than adding arbitrary small errors. The errors are currently unknown; assigning them a specific value is inaccurate.

Allowing NaN values in one error array to invalidate all the input error components is a software bug, and it should be fixed in the software. 0 and NaN values have valid scientific meaning in error arrays and should be propagated sensibly.

stscijgbot-jp commented 1 year ago

Comment by Michael Regan on JIRA:

Well, we can have a discussion on how both 0  and NaN error values has meaning. I withdraw my comment. 

stscijgbot-jp commented 1 year ago

Comment by Tyler Pauly on JIRA:

Unless we change the definition of the error array as currently generated, it is not a software bug - the error array as defined has three contributions. "Sensibly propagating" NaN values by removing them from the calculation fundamentally alters the definition of the generated array; if this is the desired outcome for all instrument teams, we can change the definition of the array. This puts the onus on the user noticing that entries in the spectral table with a NaN entry in a variance column means their error value will not be the quadrature sum of three contributions but of two (or less).

stscijgbot-jp commented 1 year ago

Comment by Melanie Clarke on JIRA:

Okay, software requirements change, then!  If this needs a cross-instrument decision, is there a way that can be done speedily to address this issue?

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

following up here after an email exchange – with Greg Sloan now responsible for overseeing all spectroscopic aspects of the pipeline (while he continues transitioning to full CalWG lead) he will directly coordinate the rest of the discussion for this issue, including also the relevant spectroscopy mode experts from the other instruments, to ensure a path forward that works for NIRSpec (which may include evolving requirements, if that turns out to be necessary) and also maintaining consistency with pipeline processing for spectroscopy data on the other instruments.

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

also adding the CalWG reps for NIRISS, NIRCam and MIRI as watchers, since this NIRSpec ticket originally listed all 4 instruments as being possibly affected, which could be the case if it is eventually decided to make a change to the pipeline for the extract_1d and resample_spec steps in order to deal with the NIRSpec flatfield variance array issue.

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

I am now fielding a helpdesk ticket that may be reporting a similar problem with x1d LRS data. That's ticket INC0193278. For the record, we may need to "fundamentally alter" the propagation of uncertainties to trap NaNs. It wouldn't be that different from ignoring NaNs when calculating a median or mean of a set of data, except here the values are adding in quadrature. The point of the uncertainties is to estimate the quality of the data, and they can't do that if they're NaNed. I have always treated uncertainties as an approximation, and if we have 1 of 3 or 4 inputs NaNed, then we have to approximate the uncertainty with what we have left. I would imagine that this point may require some discussion.

stscijgbot-jp commented 1 year ago

Comment by David Law on JIRA:

Adding a note that one of the specific avenues that this is causing a problem is through the pixel replacement step.  For the MIRI MRS at least, if pixel replacement is not used then everything works ok: NaN-valued ERR pixels are also NaN-valued fluxes and DO_NOT_USE DQ values.  If using the 'mingrad' pixel replacement everything also works ok, as new values are interpolated for both the SCI and ERR arrays and DQ values are flipped from DO_NOT_USE to being used.  If the 'fit_profile' method is used though, new values are interpolated for the SCI array and the DQ array is set to use them, but the ERR values are never changed from NaN.  This then propagates downstream.

I think this needs to be fixed in the pixel replacement step, as cube building is predicated on the notion that pixels with good-values fluxes and DQ should not have NaN errors.

stscijgbot-jp commented 11 months ago

Comment by Howard Bushouse on JIRA:

Updated NRS FFLAT ref files delivered in context 1166 as part of B10.0 installation in Ops, so that VAR_FLAT arrays are now populated in science products (see JP-3304). Does that effectively close this ticket as well?

stscijgbot-jp commented 11 months ago

Comment by Melanie Clarke on JIRA:

Howard Bushouse - I think NaN values in error arrays still need to be handled better and more consistently throughout the pipeline.  We should no longer have the issue where an entire variance component is set to NaN, but there are sometimes scattered NaN values for other reasons, and it still might be useful to set errors for invalid regions to NaN without worrying that they will contaminate later propagation. 

stscijgbot-jp commented 9 months ago

Comment by Tyler Pauly on JIRA:

Anton Koekemoer Greg Sloan David Law I think this ticket would be a good candidate for JPCT or CCT discussion, when such meetings begin or resume.

stscijgbot-jp commented 9 months ago

Comment by Anton Koekemoer on JIRA:

Thanks Tyler Pauly. Following Howard Bushouse's comment, I'd say the next step would be to first ensure that we have good consistency across the reference files of the various instruments, in terms of when NANs are / aren't used.

Once the reference files are confirmed to be consistent across the instruments in terms of their treatment of NANs, then we can return to the algorithm side in terms of how these pixels are handled in the pipeline code.

So I'm going to propose handing this to David Law to organize discussion of this in the JPCT meetings and ensure that there's convergence on that side, then this can come back to the JWST CCT side once we've confirmed that the reference files are consistent in this.

stscijgbot-jp commented 9 months ago

Comment by David Law on JIRA:

Tyler Pauly Sounds good.  We're still figuring out who all the instrument reps to the JP Coord Team will be, but I've added it to the agenda for the first meeting (tentatively next week, but TBD).

stscijgbot-jp commented 9 months ago

Comment by Melanie Clarke on JIRA:

Adding an example from a helpdesk ticket for discussion: INC0196845

The data is from PID 3788; for example, file jw03788007001_03101_00002_nrs2_rate.fits, reduced with pipeline version 1.12.5.

The error arrays for the extracted spectra from this program have large regions that are set to NaN.  The root cause is that there are a number of pixels with slightly negative values in the rate image, since the rate is close to zero for a significant portion of the array.  These have Poisson variance set to zero, which is appropriate: the total error estimate should come from the read noise only for these pixels.  When the zero variance values pass through the resample step, they are set to NaN instead in the resampled Poisson variance component.  Then, when the final error is combined for the extracted data, the NaN values are not ignored, and they contaminate the final error estimate.

The workaround is to set NaN values in the s2d file to zero and re-extract the spectrum.  A better fix would be for the resample step to ensure that zero values in the variance components are handled as valid numerical values.

stscijgbot-jp commented 7 months ago

Comment by David Law on JIRA:

As an update, this ticket will be significantly affected by https://jira.stsci.edu/browse/JP-3570 and https://jira.stsci.edu/browse/JP-3569

These should ensure that there are never NaN values in the resampled ERR or VAR arrays as zero-valued VAR_POISSON or VAR_FLAT should propagate as expected.  As such no workaround should be necessary when extracting spectra.

Setting to low priority for now, to be reassessed when the above tickets are done to see if it's actually fixed the issue reported here.

stscijgbot-jp commented 6 months ago

Comment by David Law on JIRA:

Adding a note that the example data here is indeed significantly improved by the draft PR described on https://jira.stsci.edu/browse/JP-3569

stscijgbot-jp commented 5 months ago

Comment by David Law on JIRA:

Melanie Clarke I think this ticket should now be taken care of by virtue of the other tickets that have been dealt with lately (and the one outstanding on interpolating errors in pixel replacement), would you concur?  The original ticket about NRS slit data now looks ok, presumably your MOS test case should as well?

stscijgbot-jp commented 5 months ago

Comment by Melanie Clarke on JIRA:

David Law - I think most of the known sources of NaNs in error arrays are now taken care of, yes. 

However, I think it would be more future-proof to also ignore NaN values in the sum over the errors for the extracted spectrum.  As this ticket's history shows, it's pretty easy to accidentally introduce NaNs to the error arrays, and it's less surprising to the user to ignore them than to invalidate the entire error sum.

stscijgbot-jp commented 5 months ago

Comment by David Law on JIRA:

Melanie Clarke Agreed, and now (or at least, once JP-3570 is finished) there should be no reason to be using such pixels.  Any objection to opening that as a separate ticket targeted more specifically at ignoring NaNs in extraction regions?

stscijgbot-jp commented 5 months ago

Comment by Melanie Clarke on JIRA:

Works for me!

stscijgbot-jp commented 5 months ago

Comment by David Law on JIRA:

Closing this ticket as effectively completed via linked tickets.  Some related long-term work remains though, now tracked through the more specific tickets JP-3570 and JP-3655.