spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
558 stars 164 forks source link

Add SCI/ERR/DQ check to resample/cube_build #8345

Closed stscijgbot-jp closed 1 week ago

stscijgbot-jp commented 6 months ago

Issue JP-3570 was created on JIRA by David Law:

As seen in tickets such as https://jira.stsci.edu/browse/JP-3250], unexpected NaNs in the ERR or VAR arrays can cause problems downstream since those NaNs can be propagated to the resampled products.

This ticket is to therefore catch such issues earlier, by adding a check to the start of resample and cube_build to ensure that all input data with SCI=NaN also have ERR=NaN and DQ DO_NOT_USE bit set.  Likewise, ERR=NaN should have SCI=NaN and DQ DO_NOT_USE bit set.  I believe cube_build may already have something like this implemented, as it is fairly robust against NaN-valued inputs.  An equivalent may also be needed elsewhere in TSO3 processing since that does not use the resample step.

Once implemented, this will need some fairly thorough instrument-mode testing prior to being merged as it may reveal issues with reference files that need to be addressed first.

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

David Law - I have a draft PR with these changes here:

8557

but as expected, this touches a lot of steps for all the instruments and modes.  Do you want to try to get it in in Build 11.0, or hold off for 11.1 to make sure there's plenty of time to test before merging?

stscijgbot-jp commented 2 months ago

Comment by David Law on JIRA:

Melanie Clarke Given the time left before 11.0 closes I'd say let's wait for 11.1.

stscijgbot-jp commented 2 months ago

Comment by Melanie Clarke on JIRA:

My current approach to fixing this is:

Make a function that updates a datamodel with NaN values in data, error, or variance arrays whenever the DO_NOT_USE flag is set, and updates the dq array to DO_NOT_USE whenever a NaN value is found in data, error, or variance arrays.

Call the function on output data at the end of several key steps where DO_NOT_USE might be set:

Also call it on input data at the beginning of a couple key steps where data are combined:

However, in reviewing the PR, Ned Molter asked if we should be calling the function more generally, instead of just on these specific steps.  I looked into it, and it would be straightforward to add a call to the function when results are finalized, at the end of every pipeline step, if we want to.

This would be a pretty significant scope change though, since it would then impact every output product for every step in every pipeline, including detector1, and these products do not all have the same extensions or data organization.

David Law Tyler Pauly Nadia Dencheva - Is it desirable to make a broader change, to ensure that DQ flags and NaNs always match at the end of pipeline steps, or should we keep the check inside specific steps?  Or something else?

stscijgbot-jp commented 2 months ago

Comment by David Law on JIRA:

Hm, I'm a little wary of unintended consequences and limited return on effort if we try to apply it in every step, particularly detector1.  I think I'd therefore favor restricting scope to the above, which are the most key locations.

stscijgbot-jp commented 2 months ago

Comment by Melanie Clarke on JIRA:

Okay, sounds good, we can start with that.

stscijgbot-jp commented 1 month ago

Comment by Melanie Clarke on JIRA:

David Law - one more question... I think this is close to being ready to merge now.  Do you want us to merge the changes to the dev pipeline or do you want the INS teams to test on the branch before merging?

stscijgbot-jp commented 1 month ago

Comment by David Law on JIRA:

Let's discuss at the JP meeting on Wednesday.

stscijgbot-jp commented 1 month ago

Comment by Melanie Clarke on JIRA:

Setting this on hold, waiting for INS testing effort.

stscijgbot-jp commented 2 weeks ago

Comment by Jo Taylor on JIRA:

As requested, the NIRISS team checked all active reference files for NaNs- all those with NaNs had them in the same indices for the SCI and ERR arrays. The corresponding indices in the DQ arrays were marked as DO_NOT_USE.

stscijgbot-jp commented 2 weeks ago

Comment by David Law on JIRA:

Great, thanks Jo Taylor .  Per discussion at the Aug 21 JP meeting MIRI has seen nothing unusual, and neither has NIRSpec.  I think we can move ahead with merging this, with any additional testing to take place as part of the regular build regression testing.

stscijgbot-jp commented 1 week ago

Comment by Melanie Clarke on JIRA:

Fixed by #8557