JP-3592: Improve runtime for MIRI emicorr step

penaguerrero commented 5 months ago

Closes #8412

This PR improves the running time of emicorr by introducing a new parameter, use_n_cycles, that can be modified by the user and has a default of 3. Now the user can either provide the number of integrations to phase or the number of cycles to calculate the number of integrations necessary to phase.

Checklist for PR authors (skip items if you don't have permissions or they are not applicable)

[x] added entry in CHANGES.rst within the relevant release section
[x] updated or added relevant tests
[x] updated relevant documentation
[x] added relevant milestone
[x] added relevant label(s)
[x] ran regression tests, post a link to the Jenkins job below. How to run regression tests on a PR
[x] All comments are resolved
[x] Make sure the JIRA ticket is resolved properly

drlaw1558 commented 5 months ago

I'm not sure why, but when I run with this branch it's causing a crash in the gain_scale step with 'datamodels' not defined. Doesn't make sense to me, as this PR didn't change that step, but it doesn't happen on the latest master branch.

Weirdly, the list of files changed on this PR doesn't include gain_scale_step.py, but looking at the code it changes when checking out the branch, getting rid of a 'from stdatamodels.jwst import datamodels'. Why doesn't this change show up in the list of files changed?

EDIT: I see why; the relevant branch is jp3592, not the speedupemicorr branch, which is a little confusing.

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.

Project coverage is 59.56%. Comparing base (23fb5cd) to head (b0fced2).

Files with missing lines	Patch %	Lines
jwst/emicorr/emicorr.py	91.30%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #8589 +/- ## ========================================== + Coverage 59.54% 59.56% +0.02% ========================================== Files 391 391 Lines 39292 39285 -7 ========================================== + Hits 23396 23402 +6 + Misses 15896 15883 -13 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

penaguerrero commented 5 months ago

I'm not sure why, but when I run with this branch it's causing a crash in the gain_scale step with 'datamodels' not defined. Doesn't make sense to me, as this PR didn't change that step, but it doesn't happen on the latest master branch.

Weirdly, the list of files changed on this PR doesn't include gain_scale_step.py, but looking at the code it changes when checking out the branch, getting rid of a 'from stdatamodels.jwst import datamodels'. Why doesn't this change show up in the list of files changed?

EDIT: I see why; the relevant branch is jp3592, not the speedupemicorr branch, which is a little confusing.

yes, sorry about that naming confusion. This is because while working on this I made up a workaround in order to test that large file you pointed me to (which was crashing detector1 on my computer). So the workaround to save up memory usage in the whole Detector1 pipeline.

drlaw1558 commented 4 months ago

It looks like this PR currently reduces the runtime from 12 minutes to 7 minutes for one of the long example cases that I've been testing (jw01283001001_03101_00001_mirimage), and most of the remaining time is spent in the np.where call within the loop over nbins.

I've just filed https://github.com/penaguerrero/jwst/pull/2 to help address this by restricting the array to only the integration planes for which information has been populated in phaseall. It also replaces use of the custom sigma-clipping routine with the astropy sigma_clipped_stats routine, avoiding the need for a search for finite values.

This further reduces the overall runtime to about 1 minute total, which means we should now be getting about a factor of ten improvement from where we were before.

Edit: This PR now also fixes a bug noticed by Eddie in which the correction isn't being applied to anything other than the first integration. Fixing this increases runtime again to about 2 minutes, but still acceptable for now given the bug fix.

penaguerrero commented 4 months ago

thank you @drlaw1558, I was working on a way around that where too but this is faster

drlaw1558 commented 4 months ago

LG2M now; over to you @hbushouse

hbushouse commented 4 months ago

Has a regtest run been done on this recently?

penaguerrero commented 4 months ago

running reg tests: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1564/

hbushouse commented 4 months ago

Started another regtest run after updating truth files for unrelated changes: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1565

hbushouse commented 4 months ago

Started another regtest run after updating truth files for unrelated changes: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1565

Latest results show failures/differences only in MIRI tests that include that include the Detector1 pipeline, and hence the emicorr step, as expected. A lot of differences are very small, but some are significant. Need to determine if the larger differences are expected and acceptable.

drlaw1558 commented 4 months ago

@hbushouse I just spot checked the lrs_slitless and imaging failure cases by running them locally with/without these changes. The differences look ok to me; it's the expected changes from emicorr now actually removing the EMI signal from integration 2+, which in turn has some impact on whether or not CRs are detected in the ramp, sometimes affecting the jump flagging and the final signal. Some larger differences at intermediate stages in the vicinity of detector artifacts, where the emicorr adjustment is fractionally much larger. Note to myself from this that I think the MIRI team may need to update the bad pixel map to mask out these artifacts more completely.

hbushouse commented 4 months ago

@hbushouse I just spot checked the lrs_slitless and imaging failure cases by running them locally with/without these changes. The differences look ok to me; it's the expected changes from emicorr now actually removing the EMI signal from integration 2+, which in turn has some impact on whether or not CRs are detected in the ramp, sometimes affecting the jump flagging and the final signal. Some larger differences at intermediate stages in the vicinity of detector artifacts, where the emicorr adjustment is fractionally much larger. Note to myself from this that I think the MIRI team may need to update the bad pixel map to mask out these artifacts more completely.

OK, if you're happy, I'm happy.

By the way, I don't see any explicit tests of the emicorr step itself in any of the regtests. The only way we see differences due to the step is via changes to downstream results (i.e. in later Detector1 steps). Really need to update some of the MIRI regtest modules to include checking of the emicorr step output, e.g. test_miri_image.py and test_miri_lrs_slitless.py). But that can be done later in a separate PR.

spacetelescope / jwst

JP-3592: Improve runtime for MIRI emicorr step #8589

Codecov Report