spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
564 stars 167 forks source link

level2b for mir_4qpm jw03254-c1009 crashed with "could not broadcast input array" #8313

Closed stscijgbot-jp closed 7 months ago

stscijgbot-jp commented 7 months ago

Issue JP-3553 was created on JIRA by Hien Tran:

ops saw all 9 exposures of the mir_4qpm dataset jw03254-c1009_20240118t185403_image2_0000* (for observation 4) crash in level2b processing with b10.0, in the bkg_subtract step, 


2024-02-26 16:50:47,709 - CRDS - DEBUG - Final effective context is 'jwst_1202.pmap'
2024-02-26 16:50:47,709 - CRDS - DEBUG - Computing best references locally.
2024-02-26 16:50:47,710 - CRDS - DEBUG - Bestrefs header:
{'META.EXPOSURE.NGROUPS [NGROUPS]': '100.0',
'META.EXPOSURE.READPATT [READPATT]': 'FASTR1',
'META.EXPOSURE.TYPE [EXP_TYPE]': 'MIR_4QPM',
'META.INSTRUMENT.BAND [BAND]': 'UNDEFINED',
'META.INSTRUMENT.CHANNEL [CHANNEL]': 'UNDEFINED',
'META.INSTRUMENT.CORONAGRAPH [CORONMSK]': '4QPM_1140',
'META.INSTRUMENT.DETECTOR [DETECTOR]': 'MIRIMAGE',
'META.INSTRUMENT.FILTER [FILTER]': 'F1140C',
'META.INSTRUMENT.LAMP_STATE [LAMP]': 'OFF',
'META.INSTRUMENT.NAME [INSTRUME]': 'MIRI',
'META.OBSERVATION.DATE [DATE-OBS]': '2024-02-26',
'META.OBSERVATION.TIME [TIME-OBS]': '03:11:49.739',
'META.SUBARRAY.NAME [SUBARRAY]': 'MASK1140',
'META.VISIT.TSOVISIT [TSOVISIT]': 'F',
'META.VISIT.TYPE [VISITYPE]': 'PRIME_TARGETED_FIXED',
'REFTYPE': 'UNDEFINED'}
2024-02-26 16:50:47,712 - CRDS - DEBUG - Reference type 'distortion' defined as 'jwst_miri_distortion_0047.asdf'
2024-02-26 16:50:47,712 - CRDS - DEBUG - Reference type 'drizpars' defined as 'jwst_miri_drizpars_0001.fits'
2024-02-26 16:50:47,712 - CRDS - DEBUG - Reference type 'filteroffset' defined as 'jwst_miri_filteroffset_0010.asdf'
2024-02-26 16:50:47,712 - CRDS - DEBUG - Reference type 'flat' defined as 'jwst_miri_flat_0771.fits'
2024-02-26 16:50:47,712 - CRDS - DEBUG - Reference type 'photom' defined as 'jwst_miri_photom_0181.fits'
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for AREA reference file is 'N/A'.
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for CAMERA reference file is 'N/A'.
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for COLLIMATOR reference file is 'N/A'.
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for DFLAT reference file is 'N/A'.
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for DISPERSER reference file is 'N/A'.
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for DISTORTION reference file is '/ifs/archive/ops/jwst/ref/tmp_crds/crds/cache/references/jwst/miri/jwst_miri_distortion_0047.asdf'.
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for DRIZPARS reference file is '/ifs/archive/ops/jwst/ref/tmp_crds/crds/cache/references/jwst/miri/jwst_miri_drizpars_0001.fits'.
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for FFLAT reference file is 'N/A'.
2024-02-26 16:50:47,714 - stpipe.Image2Pipeline - INFO - Prefetch for FILTEROFFSET reference file is '/ifs/archive/ops/jwst/ref/tmp_crds/crds/cache/references/jwst/miri/jwst_miri_filteroffset_0010.asdf'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for FLAT reference file is '/ifs/archive/ops/jwst/ref/tmp_crds/crds/cache/references/jwst/miri/jwst_miri_flat_0771.fits'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for FORE reference file is 'N/A'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for FPA reference file is 'N/A'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for IFUFORE reference file is 'N/A'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for IFUPOST reference file is 'N/A'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for IFUSLICER reference file is 'N/A'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for MSA reference file is 'N/A'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for OTE reference file is 'N/A'.
2024-02-26 16:50:47,715 - stpipe.Image2Pipeline - INFO - Prefetch for PHOTOM reference file is '/ifs/archive/ops/jwst/ref/tmp_crds/crds/cache/references/jwst/miri/jwst_miri_photom_0181.fits'.
2024-02-26 16:50:47,716 - stpipe.Image2Pipeline - INFO - Prefetch for REGIONS reference file is 'N/A'.
2024-02-26 16:50:47,716 - stpipe.Image2Pipeline - INFO - Prefetch for SFLAT reference file is 'N/A'.
2024-02-26 16:50:47,716 - stpipe.Image2Pipeline - INFO - Prefetch for SPECWCS reference file is 'N/A'.
2024-02-26 16:50:47,716 - stpipe.Image2Pipeline - INFO - Prefetch for WAVELENGTHRANGE reference file is 'N/A'.
2024-02-26 16:50:47,716 - stpipe.Image2Pipeline - INFO - Prefetch for WFSSBKG reference file is 'N/A'.
2024-02-26 16:50:47,717 - stpipe.Image2Pipeline - INFO - Starting calwebb_image2 ...
2024-02-26 16:50:47,724 - stpipe.Image2Pipeline - INFO - Processing product jw03254004001_04101_00006_mirimage
2024-02-26 16:50:47,724 - stpipe.Image2Pipeline - INFO - Working on input jw03254004001_04101_00006_mirimage_rateints.fits ...
2024-02-26 16:50:47,892 - stpipe.Image2Pipeline.bkg_subtract - INFO - Step bkg_subtract running with args (<CubeModel(6, 224, 288) from jw03254004001_04101_00006_mirimage_rateints.fits>, ['jw03254002001_02101_00001_mirimage_rateints.fits', 'jw03254002001_02101_00002_mirimage_rateints.fits', 'jw03254003001_02101_00001_mirimage_rateints.fits', 'jw03254003001_02101_00002_mirimage_rateints.fits']).
2024-02-26 16:50:47,893 - stpipe.Image2Pipeline.bkg_subtract - INFO - Step bkg_subtract parameters are: {'pre_hooks': [], 'post_hooks': [], 'output_file': None, 'output_dir': None, 'output_ext': '.fits', 'output_use_model': False, 'output_use_index': True, 'save_results': False, 'skip': False, 'suffix': 'bsubints', 'search_output_file': True, 'input_dir': '/ifs/archive/ops/jwst/info/owlmgr/paths/sdp/asn_creation/cal/level2', 'save_combined_background': False, 'sigma': 3.0, 'maxiters': None, 'wfss_mmag_extract': None}
2024-02-26 16:50:47,924 - stpipe.Image2Pipeline.bkg_subtract - INFO - Accumulate bkg from jw03254002001_02101_00001_mirimage_rateints.fits
Traceback (most recent call last):
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/bin/strun", line 26, in <module>
step = Step.from_cmdline(sys.argv[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/stpipe/step.py", line 186, in from_cmdline
return cmdline.step_from_cmdline(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/stpipe/cmdline.py", line 386, in step_from_cmdline
step.run(*positional)
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/stpipe/step.py", line 478, in run
step_result = self.process(*args)
^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/jwst/pipeline/calwebb_image2.py", line 67, in process
result = self.process_exposure_product(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/jwst/pipeline/calwebb_image2.py", line 148, in process_exposure_product
input = self.bkg_subtract(input, members_by_type['background'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/stpipe/step.py", line 478, in run
step_result = self.process(*args)
^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/jwst/background/background_step.py", line 91, in process
bkg_model, result = background_sub.background_sub(input_model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/jwst/background/background_sub.py", line 153, in background_sub
bkg_model = average_background(input_model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/jwst/background/background_sub.py", line 222, in average_background
bkg_data, bkg_err, bkg_dq = im_array.get_subset_array(bkg_array)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/jwst/background/background_sub.py", line 112, in get_subset_array
data_overlap[:data_cutout.shape[0], :data_cutout.shape[1]] = copy.deepcopy(data_cutout)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not broadcast input array from shape (26,224,288) into shape (6,224,288)
2024057165048 INFO src=ssb_calibration_wrapper._strun_analyze_and_log_failure fsn=jw03254-c1009_20240118t185403_image2_00004_asn msg="strun /dms/local/jwst/pipeline/pkgs/miniconda3/envs/jwstdp-1.12.5.20231019-py3.11/lib/python3.11/site-packages/jwst/pipeline/calwebb_image2.cfg FAILED (exit=1) on jw03254-c1009_20240118t185403_image2_00004_asn.json."
2024057165048 ERROR src=ssb_calibration_wrapper.calibrate._strun._strun_analyze_and_log_failure fsn=jw03254-c1009_20240118t185403_image2_00004_asn msg="strun exit status=1" ```
the error message _"could not broadcast input array from shape (26,224,288) into shape (6,224,288)"_ suggests that the numbers of integrations for the background and science images do not match.

the association files for one of the c1009 exposures is attached, along with one for c1008, which did **not** crash. note:
 * c1009 refers to association for obs 4, which is the psf ref star, with nints = 6
 * c1008 is the association for obs 1, which is the actual science target, with nints = 26
 * they both include **two** sets of observations as background members, obs 2 & 3
 * obs 2 has nints = 26, obs 3 has nints = 6
 * c1009 image2 crashed but c1008 did not

questions:
 # are the "background" associations such as these supposed to have more than one set of background observation? it *appears* bkgr obs 2 is intended to go along with obs 1, and bkgr obs 3 with obs 4
 # could the failure be due to the **ordering** of the background members in the asn table (json) file? as obs 2 bkgr member is always listed first after the science member, it matches nints (26) with that of obs 1, but not obs 4 
stscijgbot-jp commented 7 months ago

Comment by Hien Tran on JIRA:

pool file also attached

stscijgbot-jp commented 7 months ago

Comment by Tyler Pauly on JIRA:

Looking at the program in APT, it looks like each observed source has four observations: a “science” pointing, a “reference” pointing, and one background for each of the prior. Those backgrounds use the same stored target_id but are different shapes to match the science and reference exposures. Looking through the pool, I don't see any metadata that would allow for association rules to differentiate the backgrounds. It could be (have been) fixed by using separate target_id for the two backgrounds, even if they are at the same location.

stscijgbot-jp commented 7 months ago

Comment by Hien Tran on JIRA:

perhaps another option is adding "nints" and/or "ngroups" as metadata in the pool file

stscijgbot-jp commented 7 months ago

Comment by Tyler Pauly on JIRA:

That would work! I don't know the details of how the pools are made and how entries are selected for inclusion, but that would simplify the fix for cal, if it's possible.

stscijgbot-jp commented 7 months ago

Comment by Jonathan Aguilar on JIRA:

I'm the MIRI coronagraphy lead. Matching exposure parameters is fine for now. I will add "Users should make a separate background target" to the instrument scientist checklist so that this doesn't happen again.

I might also request that this be enforced in APT (i.e., if you want to reuse the same background target, then you have to make a copy of it first), but I will talk with my team about that first.

stscijgbot-jp commented 7 months ago

Comment by Howard Bushouse on JIRA:

Another possible fix for this, without the need to modify the ASN pool contents or ASN rules (in order to discriminate exposures based on NINTS), would be to add a check into the background step itself that skips over any background members that have NINTS different from the science exposure. And in the worst case, where none of the backgrounds have a matching NINTS, it would end up just skipping the background step entirely. This kind of trap would likely be good to have in the step anyway, just to guard against users mistakenly creating custom ASN files that have incompatible background exposures in them.

stscijgbot-jp commented 7 months ago

Comment by Howard Bushouse on JIRA:

After looking in detail at the actual code that's used in the background step to handle 3-D (multi-integration) background files, I've determined that it does NOT try to do an integration-by-integration subtraction from the corresponding multi-int science exposure. Instead, it computes the sigma-clipped mean of all the integrations in each background exposure (i.e. collapses the integrations into a single 2-D image), and then takes the sigma-clipped mean of all the 2-D background images to form a final mean 2-D background image. That mean 2-D bkg image is then subtracted from all integrations of the 3-D science exposure.

So the step does not actually require the science and background exposures to have the same NINTS. There was just an issue down in the guts of the code that was using the shape of the science image to construct a temporary accumulation array for working on the background exposures, which then led to the crash when the science and background exposures didn't have the same NINTS. A minor modification to that part of code allows the use of background exposures that have any arbitrary NINTS and successfully collapses them all into a mean 2-D background image, which is then subtracted from each science integration.

So this modification to the cal pipeline code allows this processing to succeed and produces valid scientific results (why not use as many background exposures as are available?).

We can also still proceed with the previously suggested modifications to the ASN pool and rules, if desired, in order to prevent the association of background exposures from different observations in the first place.

stscijgbot-jp commented 7 months ago

Comment by Howard Bushouse on JIRA:

Basic issue of cal pipeline background step not being able to handle inputs with varying values of NINTS fixed in #8326

stscijgbot-jp commented 6 months ago

Comment by John Scott on JIRA:

more cases

jw03254-c1009_20240417t042916_image2_00005 jw03254-c1009_20240417t042916_image2_00001 jw03254-c1009_20240417t042916_image2_00002 jw03254-c1009_20240417t042916_image2_00007 jw03254-c1009_20240417t042916_image2_00008 jw03254-c1009_20240417t042916_image2_00009 jw03254-c1009_20240417t042916_image2_00004 jw03254-c1009_20240417t042916_image2_00006 jw03254-c1013_20240417t042916_image2_00002 jw03254-c1013_20240417t042916_image2_00001 jw03254-c1009_20240417t042916_image2_00003 jw03254-c1013_20240417t042916_image2_00003 jw03254-c1013_20240417t042916_image2_00004 jw03254-c1013_20240417t042916_image2_00005 jw03254-c1013_20240417t042916_image2_00007 jw03254-c1013_20240417t042916_image2_00006 jw03254-c1013_20240417t042916_image2_00008 jw03254-c1013_20240417t042916_image2_00009 jw03254-c1011_20240417t042916_image2_00001 jw03254-c1011_20240417t042916_image2_00002 jw03254-c1011_20240417t042916_image2_00003 jw03254-c1011_20240417t042916_image2_00004 jw03254-c1011_20240417t042916_image2_00005 jw03254-c1011_20240417t042916_image2_00006 jw03254-c1011_20240417t042916_image2_00007 jw03254-c1011_20240417t042916_image2_00008 jw03254-c1011_20240417t042916_image2_00009

stscijgbot-jp commented 1 month ago

Comment by David Law on JIRA:

Hien Tran Another one for you: is this fixed now in the 11.0 run?

stscijgbot-jp commented 1 month ago

Comment by Hien Tran on JIRA:

repro of jw03254-c1009 with b10.2 on 2024-07-12 in ops showed that the error was no longer encountered. this is fixed, closing.