Open hbushouse opened 6 years ago
There's a similar problem occurring for a NIRISS SOSS TSO exposure in the DMS test data cache. In the DMS test environment on the C-string the exposure "jw10003001001_03101_00001-seg002_nis" is failing in calwebb_tso1
processing during the ramp_fit
step with the simple error "Killed" showing up in the processing log. The suspected problem is running out of memory, although I would've thought the test environment would have enough RAM to handle this.
The latest DMS run of this dataset can be found on the C-string in "/ifs/int/jwstc/store/doggett/tests/run273/", with the error log "/ifs/int/jwstc/owl/logs/doggett_jw10003001001_03101_00001-seg002_nis_1528446667.361178/ALOG_1528447835_level_2a_jw10003001001_03101_00001-seg002_nis.err."
Interestingly, "seg001" of this exposure, which is the same size as "seg002", succeeds in completing the ramp_fit
step and calwebb_tso1
processing.
JIRA ticket https://jira.stsci.edu/browse/JP-323 reports problems processing a (rather large) NIRSpec BrightObj (TSO) exposure in
calwebb_detector1
, with the processing going on for hours and hours and eventually crashing.The dataset in question is NRS_BRIGHTOBJ mode, using the SUB2048 (2048 x 32) subarray, NGROUPS=3 and NINTS=3000. The size of the level-1b (uncal) file is ~1 GB.
I did some trial runs of
calwebb_detector1
processing on my system that has 32 GB of RAM and found some interesting behavior. The 2 steps with the longest processing time are unsurprisinglyjump
andramp_fit
, because of 3000 integrations to process. The output of thejump
step was saved to a _ramp product. Examination of thegroup_dq
array in the _ramp file showed that many pixels, in all integrations, have group 3 flagged as an outlier. So that meansramp_fit
is left to deal with only 2 groups in many situations.The
ramp_fit
step now does its processing in 3 main loops or phases. I tracked the processing time and memory usage of each phase.At the end of processing
ramp_fit
reported:The total execution time translates to 440 mins or 7.3 hours (!).
Processing did succeed, but it obviously took very long and nearly exhausted the RAM on my system. The steady increase in RAM usage during Phase 1 of
ramp_fit
is at least interesting, if not actually worrisome. Should that be happening? Are there arrays that are being steadily built-up during that phase? Or do we have a problem with memory not getting freed properly at the end of the processing for each integration?The dataset in question is available at: /grp/jwst/ssb/bushouse/jwst_data/NIRSpec/BrightObj/BOTS_uncal.fits. The BOTS_ramp.fits file is also there, which can be used as input directly to the
ramp_fit
step (to avoid having to redo all the upstream processing).