spacetelescope / stcal

https://stcal.readthedocs.io/en/latest/
Other
10 stars 32 forks source link

JP-3708: Adding the CRMAG Array to the Optional Results Product #304

Closed kmacdonald-stsci closed 10 hours ago

kmacdonald-stsci commented 3 weeks ago

Resolves JP-3708

This PR addresses a crash that occurs when running multiprocessing for ramp fitting with the C extension while choosing to save the optional results product. This is because the CRMAG array was not implemented in the C extension, so the C extension returns a NoneType, so when the data slices attempt to reassemble an exception is thrown because there is no CRMAG to reassemble.

The CRMAG array is a 4-D array, with dimensions (nints, max_num_crs, nrows, ncols). For each integration ramp there is some number of cosmic ray jumps, including 0. The dimension max_num_crs is the maximum number of jumps found over all integrations and all pixels. For ramps that have fewer than the max number of jumps the last elements in that pixel ramp are 0. For example, if the maximum number of jumps found was 5, but a ramp has only two jumps, the first two elements of the CRMAG array for that integration, row, and column are non-zero, while the last three are 0. The non-zero elements of the ramp array are the magnitude of the jump, which is the difference between the two groups where a jump was detected.

The CRMAG is now implemented in the C extension and functions properly with multiprocessing.

Tasks

news fragment change types... - ``changes/.apichange.rst``: change to public API - ``changes/.bugfix.rst``: fixes an issue - ``changes/.general.rst``: infrastructure or miscellaneous change
kmacdonald-stsci commented 3 weeks ago

A regression test for this is here:

https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1780

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 86.34%. Comparing base (60bd3b8) to head (38bad7d). Report is 6 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #304 +/- ## ========================================== + Coverage 86.21% 86.34% +0.13% ========================================== Files 47 49 +2 Lines 8812 8896 +84 ========================================== + Hits 7597 7681 +84 Misses 1215 1215 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kmacdonald-stsci commented 1 week ago

I have gone through the code a few times. To the best of my ability it looks correct to me and frees the necessary memory when it is done. The comments you gave me were essential for me to understand the flow. If you can somehow add some of those details into the code so in the future when we need to look at the code it is easier to understand what is going on that might be helpful.

I added comments to the linked lists used to more thoroughly describe the implementations. Please take a look at them to let me know if they help clarify how they are implemented.

melanieclarke commented 2 days ago

I'm testing these changes on some sample data I have on disk, and this command, running without multiprocessing completes in seconds: strun ramp_fit jw01247667001_02101_00001_mirifulong_jump.fits --save_opt=True --opt_name=opt_serial.fits

This one, with multiprocessing on, I killed after 5 minutes because it didn't look like it was ever going to complete: strun ramp_fit jw01247667001_02101_00001_mirifulong_jump.fits --save_opt=True --opt_name=opt_parallel.fits --maximum_cores=half After killing it, it reports 20 leaked semaphores.

This one, with multiprocessing on but optional output off completes in seconds: strun ramp_fit jw01247667001_02101_00001_mirifulong_jump.fits --maximum_cores=half After completion, it reports 10 leaked semaphores.

Can you please double check the interaction between multiprocessing and optional output results? I'm wondering if there's a deadlock somewhere.

kmacdonald-stsci commented 10 hours ago

Closing to fix multiprocessing bug in the optional results product.