Reference pixels set to DO_NOT_USE in DQ is fragile

jdavies-st commented 2 years ago

Currently, the way that reference pixels get set to DO_NOT_USE in calwebb_detector1 is by relying on them being set to zero in the GAIN reference file. If they are set to zero there (and there's not a strong reason they should be), then ramp_fit flags any gain values <0 or NaN as NO_GAIN_VALUE and DO_NOT_USE here

https://github.com/spacetelescope/stcal/blob/8e00460560eea757f6959ef17b5877583bc60de8/src/stcal/ramp_fitting/utils.py#L981-L1017

If they are not set to zero in the GAIN reference file, as was the case briefly with NIRCam gain reference files this week, then the reference pixels never get set to DO_NOT_USE, ramp_fitting fits them, and they end up in the output _i2d mosaic produced by calwebb_image3, producing unwanted artifacts. See right side below.

Screen Shot 2022-09-01 at 12 00 15

Setting GAIN reffile ref pix to zero so that they get flagged as DO_NOT_USE in ramp_fitting is a fragile and roundabout way to get that job done.

@hbushouse is there a good reason to not set the reference pixels DQ values to DO_NOT_USE at the end of the refpix step? They've been corrected and now can be ignored from here on out.

hbushouse commented 2 years ago

This is another one of those little details who's reasons are lost to the mists of time (at least it's pretty misty in my mind). I recall at least one Cal WG meeting in which the question was asked whether we should bother doing ramp fitting to reference pixels and the answer was along the lines of "Sure, why not; you never know what kind of interesting info that may turn up. The slopes don't get used for anything later anyway, so what can it hurt?" So they were not flagged and left as "normal" pixels.

And then of course some years later it was discovered that they were not getting removed when doing mosaics and I'm sure we implemented some kind of fix back then (when it was discovered the first time). Will have to dig around to see what the method was back then. Perhaps that's when we added the "REFERENCE PIXEL" DQ flag label and assumed that steps like resample would reject based on that flag alone (i.e. not also requiring DO_NOT_USE). Will need to go back and try to find the history of this. I'm 99% sure we supposedly fixed this already.

nden commented 2 years ago

That's my recollection too. It would be good to start a kind of algorithms subsection to each step in RTD to document things like this so the knowledge is not lost.

stscirij commented 2 years ago

I vaguely remember the addition of the REFERENCE_PIXELS DQ bit was to make the reference pixel processing more straightforward, but it could certainly be used to exclude regions from subsequent processing. Perhaps the NON_SCIENCE DQ bit would be more appropriate to set after the refpix calculations are done?

hbushouse commented 2 years ago

But, but, but ... here in the resample step it sure looks to me like pixels flagged with REFERENCE_PIXEL are set to be rejected: https://github.com/spacetelescope/jwst/blob/master/jwst/resample/resample.py#L284

Is this somehow not working as intended?

hbushouse commented 2 years ago

But at the same time the top level resample_step module has https://github.com/spacetelescope/jwst/blob/master/jwst/resample/resample_step.py#L20 which does not pay attention to REFERENCE_PIXEL and gets stored in the kwargs passed around to lower-level routines: https://github.com/spacetelescope/jwst/blob/master/jwst/resample/resample_step.py#L212

So perhaps what we have here is failure to communicate between the various levels of the resample step?

jdavies-st commented 2 years ago

I think the thing to remember is that DO_NOT_USE needs to be set if the pixels are to not be used in subsequent (or the current) step, previous hacks aside. All the other higher bits are informational only. So we can set many bits, but in the end, resample should be using all the bits except DO_NOT_USE. This is the design architecture at least. The question then becomes which is the best place to set the reference pixels to DO_NOT_USE. At the end of refpix? Or later? ramp_fitting? Later still?

Of course resample has special problems because the code was pulled from drizzlepac which has a very different bit usage architecture, and it's never fully transitioned to the correct jwst one. I.e. the whole goodbits architecture. It should be jettisoned to be consistent with the rest of the pipeline. There used to be an open issue on this which I cannot find that I know @stscieisenhamer worked on.

The behavior up until this past week for the pipeline for NIRCam was that the GAIN reffile had no gain value in the reference pixels, and therefore there was no ramp fit in those areas - they were set to zero and flagged as DO_NOT_USE and NO_GAIN_VALUE. The newest GAIN reffiles that were delivered Friday, the current bestrefs, also have this feature. But it is fragile. Hence the current issue. And other instruments, such as MIRI do not set reference pixels in the GAIN reffile to zero. So for those, reference pixels will show up in the output mosaics.

tapastro commented 2 months ago

@stscijgbot-jp

nden commented 2 months ago

@stscijgbot-jp

stscijgbot-jp commented 2 months ago

This issue is tracked on JIRA as JP-3703.

stscijgbot-jp commented 2 months ago

Comment by Melanie Clarke on JIRA:

Thanks, Nadia Dencheva and Tyler Pauly!

David Law - I'm reviving this issue originally filed on GitHub a couple years ago because I think it might still be relevant.

It looks to me like most of the time, reference pixels are set to DO_NOT_USE by the end of detector1, but it might be a good idea to explicitly make sure they are NaN/DO_NOT_USE after the reference pixel step so they don't accidentally get used for science.

There is currently an option for the reference_pixel step that allows preserving reference pixel values specifically for NIRSpec IRS2 data (preserve_irs2_refpix), for calibration purposes. If desired, we could modify that argument to be preserve_refpix, and set reference pixels for all instruments to DO_NOT_USE only if it is set to False.

What do you think?

jdavies-st commented 2 months ago

I don't think setting the reference pixels to NaN during or at the end of refpix is needed. I don't see the purpose or use case, and that is not done currently. Only setting them to DO_NOT_USE is important, so that for ramp_fit, the result will be NaN if all groups are DO_NOT_USE. The result would be the same as currently done, but also allowing IRS2 data to preserve the values in the reference pixels.

melanieclarke commented 2 months ago

There is a separate effort (JP-3570, #8345) to ensure that whenever DO_NOT_USE is set, the data is also set to NaN. This is intended to help data integrity and consistency, and reduce similar fragility issues across the code base.

This is why I suggested adapting the optional argument to preserve the reference pixels if desired -- that way you can have the values if you need them for calibration or diagnostic purposes, but they won't be there to interfere with science by default.

stscijgbot-jp commented 2 months ago

Comment by Howard Bushouse on JIRA:

In a JWST Calibration meeting about 10 years ago, it was suggested that reference pixels retain their values all the way through processing and even be processed in the ramp fit step, based on the assertion that "you never know what kind of interesting or useful info they might provide". I don't know if they're processed (i.e. have slopes computed) in the ramp fit step, but I agree that they should not have their values wiped out after being used. DQ flags were implemented for a purpose, which is to be used in all situations and at all times. We shouldn't have to set pixel values to NaN in order to reduce the risk of bad data getting used. Software that pays attention to DQ flags should do that for us. Getting users to pay attention to DQ flags, on the other hand, is out of our control.

stscijgbot-jp commented 2 months ago

Comment by Howard Bushouse on JIRA:

Another option would be to take the approach used in other STScI cal pipelines for IR data (calnica, calwf3), which is to strip off the reference pixels from calibrated images once they've been used. In this case that would imply having them stripped away in rate/rateints files, which is already done for NIRSpec IRS2 data in terms of removing the extra interleaved ref pixels.

stscijgbot-jp commented 3 days ago

Comment by David Law on JIRA:

Melanie Clarke Picking up this thread.

I'm not sure what instrument James was using above (NIRCam?) but for MIRI at least the rate files contain finite values with DQ=REFERENCE_PIXEL. The flatfield indicates that these have no valid flat, setting NO_FLAT_FIELD, NON_SCIENCE, and DO_NOT_USE, ensuring that they don't get further in the pipeline.

The specific issue to address really seems to be the ability of reference pixels to propagate to science data products. The breakpoint to my mind in where it makes sense to propagate these any further is the flatfield step, after which they no longer have much if any meaning. One potential solution may therefore be for the flatfield step to set DO_NOT_USE for anything with DQ=REFERENCE_PIXEL. Most of the time this shouldn't make any difference since REFERENCE_PIXEL is typically also flagged as DO_NOT_USE in the flat reference files.

We can bounce off the teams at a JP call.

spacetelescope / jwst

Reference pixels set to DO_NOT_USE in DQ is fragile #7015