JP-3654: NSClean subarray speedup

t-brandt commented 2 months ago

Closes #8551

This PR changes the behavior of NSClean in subarray mode. Previously, the code would fit all unilluminated pixels simultaneously. This required very large amounts of time and memory for larger subarrays. This PR changes that behavior to compute the correction in overlapping regions, resulting in large speedups with little change in the results. The reference pixels are no longer used in the correction, which helps with some ringing artifacts near the detector edges.

Checklist for PR authors (skip items if you don't have permissions or they are not applicable)

[x] added entry in CHANGES.rst within the relevant release section
[ ] ~updated or added relevant tests~
[x] updated relevant documentation
[x] added relevant milestone
[x] added relevant label(s)
[x] ran regression tests, post a link to the Jenkins job below. How to run regression tests on a PR
[x] All comments are resolved
[x] Make sure the JIRA ticket is resolved properly

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 3.84615% with 50 lines in your changes missing coverage. Please review.

Project coverage is 60.79%. Comparing base (8f6814f) to head (12f5d3e). Report is 19 commits behind head on master.

Files with missing lines	Patch %	Lines
jwst/nsclean/nsclean.py	2.12%	46 Missing :warning:
jwst/nsclean/lib.py	20.00%	4 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #8745 +/- ## ========================================== - Coverage 60.85% 60.79% -0.06% ========================================== Files 373 373 Lines 38657 38690 +33 ========================================== Hits 23523 23523 - Misses 15134 15167 +33 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

melanieclarke commented 2 months ago

@t-brandt - I'd like to get this in if we can, since we're closing in on the end of the development period for the current build. Would you mind if I go ahead and make the last clean up changes?

t-brandt commented 2 months ago

Please do go ahead and do the last cleanup. Thanks!

melanieclarke commented 2 months ago

Testing locally, results look similar to previous, and processing time is greatly improved.

It's now possible to process subarray=ALLSLITS data, so I removed the restriction on that data type.

Test data was: SUBS200A1: jw02757002001_03104_00001_nrs1_rate.fits ALLSLITS: jw01128004001_0310g_00003_nrs1_rate.fits

melanieclarke commented 2 months ago

~Regression tests started here: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1690/~

Tests aborted - looking at some results locally first.

melanieclarke commented 2 months ago

@t-brandt - Looking at local regression test results, I see a couple things we should maybe discuss some more.

While the ALLSLITS data runs through nsclean, results are quite poor with mask_spectral_regions=True (the default). I think this is just the known issue with with overfitting highly masked regions, and the workaround is to run with mask_spectral_regions = False and modify nsigma as needed. I don't think we need to address that in this PR, but we should consider different default values.

Also, the change to remove forcing the edges to be included in the fit (mask=True) seems to be introducing some edge effects in some cases. For the SUBS200A1 regression test data, it shows up only in the reference columns (they are not NaN in this older input rate data). For the ALLSLITS regression test data, it shows up in the next couple columns too.

Are the edge issues something we can fix in the subarray correction now? Or should we revert the change to the reference pixel masking?

Here's what the ALLSLITS regression test data looks like. Top is nscleaned, bottom is the rate file.

melanieclarke commented 2 months ago

For comparison, here's what the same data looks like with better masking (top) and with better masking + the reference edges included in the fit (bottom).

t-brandt commented 2 months ago

Maybe what I/we should do then is to zero out the 1/f correction in the reference pixels? Right now they are masked, and the correction is extrapolated into those pixels (which is a terrible approximation). Zeroing out the correction in those pixels is a one or two line change. I don't think leaving the reference pixels in the mask should be an option, since they should have already been used in the refpix step and will have no more information to add.

melanieclarke commented 2 months ago

That sounds worth trying - it's similiar to the way the full frame correction sets the correction to zero if there are no unmasked pixels in the column. For the ALLSLITS case, the edge effects show up in the first 6 columns, so refpix + 2 real science columns. Would that address those columns too?

t-brandt commented 2 months ago

Ok, I have pushed several small changes to fix the edge effects. These happened when a large, contiguous part of the area used for the correction was masked for fitting 1/f. In those cases, the matrices became very poorly conditioned. The approach now skips rows at the end without any available pixels for fitting 1/f.

melanieclarke commented 2 months ago

Local regression tests and spot checks with my other test data sets look good. Thanks for the fix!

I'll run the full regression test set now and hopefully we can get this merged!

Regression tests running here: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1697/

t-brandt commented 2 months ago

The overfitting present with the default parameters is because the range of frequencies that NSClean is fitting is too large when a large fraction of the array is missing due to the mask. This can be fixed by changing these frequencies (see the image below). If we want to actually implement this is will be a change of a few lines; I can do it tomorrow. I think the most natural way to implement the change would be to check for allslits mode and mask_spectral_regions set to True, and if both of these are set, change the frequency cutoffs from current the default values.
comparison

melanieclarke commented 2 months ago

@t-brandt - that sounds reasonable to me to include here, since this is the first time ALLSLITS data will be correctable with NSClean, and since it really is unusable with current default parameters.

t-brandt commented 2 months ago

Ok, the default behavior for ALLSLITS should now be reasonable.

melanieclarke commented 2 months ago

Regression tests restarted here: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1702/

melanieclarke commented 2 months ago

@hayescr - tagging you in to this conversation for NIRSpec's information.

This PR is the fix for subarray performance, which enables ALLSLITS subarrays to be corrected. The default correction frequencies worked very poorly when spectral regions are masked, so we are modifying the cutoff frequencies for that subarray specifically. We should discuss at a later date if we want to do something similar for the other modes that frequently get overcorrected, like MOS.

t-brandt commented 2 months ago

Great, thanks @melanieclarke! @hayescr and @melanieclarke, I think that the issue with overcorrection in various modes is best solved by exposing a single parameter (provisionally called aggressiveness) to the user. This would tune the frequencies to be corrected and the coupling of the four outputs. We could set a default with reasonable behavior, with more aggressive removal appropriate to cases where at least ~half of the array is available as a reference for the correction, and less aggressive removals appropriate to cases where most of the array is unavailable for a correction. I envision this applying to both subarray and full frame corrections.

melanieclarke commented 2 months ago

That sounds good to me @t-brandt - it would be nice to be able to tune that without asking the user to modify frequency cutoffs themselves!

I think we don't yet have a ticket for the aggressiveness parameter - can you please file one, and we'll continue the conversation there? Time is short on this build, but we should have plenty of time to work on it in the next one.

t-brandt commented 2 months ago

Sounds perfect. The ticket would modify the full frame NSClean behavior and create an aggressiveness parameter. I will make one.

hayescr commented 2 months ago

Thanks @melanieclarke and @t-brandt, agreed that we might want slightly different cutoff frequencies for different cases, perhaps other subarrays when the spectral regions are masked as well. I had a look at some of Tim's tests of setting this aggressiveness and there were some cases where different cases looked better with different values, so this would definitely be worth investigating. Feel free to tag me on that once you open it Tim, I'm happy to continue the conversation more there.

spacetelescope / jwst

JP-3654: NSClean subarray speedup #8745

Codecov Report