spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
558 stars 164 forks source link

mrs_imatch step output files have odd structure #7130

Open stscijgbot-jp opened 1 year ago

stscijgbot-jp commented 1 year ago

Issue JP-2602 was created on JIRA by Howard Bushouse:

When processing MIRI MRS exposures through the calwebb_spec3 pipeline and requesting that the "mrs_imatch" step results be saved to files on disk, the resulting FITS files have an odd structure. First, there are 2 instances of a "SCI" extension, when there should be only 1, and the EXTVER values of the "good" data extensions are set to 8, instead of 1.

For example:

Filename: jw01024-o001_t001_mirifushort_short_3_mrs_imatch.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     339   ()      
  1  SCI           1 ImageHDU        35   ()      
  2  SCI           8 ImageHDU        39   (1032, 1024)   float32   
  3  ERR           8 ImageHDU        10   (1032, 1024)   float32   
  4  DQ            8 ImageHDU        11   (1032, 1024)   int32 (rescales to uint32)   
  5  VAR_POISSON    8 ImageHDU         9   (1032, 1024)   float32   
  6  VAR_RNOISE    8 ImageHDU         9   (1032, 1024)   float32   
  7  VAR_FLAT      8 ImageHDU         9   (1032, 1024)   float32   
  8  ASDF          1 BinTableHDU     11   1R x 1C   [25722249B]

We've often seen instances of extra SCI extensions in products that don't normally have them (e.g. x1d files), simply because the datamodel from which the file is created has header keywords that are assigned to the SCI HDU and hence the extension gets created just to have a place to write those keywords. But in this case the output products already have a SCI extension, so you end up with 2 instances for some reason. 

This odd structure then carries over to the "crf" products created by the subsequent "outlier_detection" step in the calwebb_spec3 pipeline.

This does not seem to affect the actual data going through pipeline processing, which are all carried and passed along from step to step in memory via datamodels and containers.

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

tagging David Law  and also  Misty Cracraft  and Karl Gordon  (MIRI reps for CalWG) to ask if you wouldn't mind please assigning an "INS Team" priority to this for MIRI? (this doesn't seem to be holding up commissioning but it would help to at least have a priority for it, even if low/med etc)

stscijgbot-jp commented 1 year ago

Comment by David Law on JIRA:

Added priority mir_low as it seems to only affect intermediate products in an offline capacity rather than main pipeline results.  It would be helpful to see an example association file that produces the problem (mrs_imatch will be tested against flight data once sufficient commissioning data is available).

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

David Law there's an example asn file in the data directory listed in the ticket info above, along with the various input/output fits files for that ASN. To see the problem, the calwebb_spec3 pipeline has to be executed with "--steps.mrs_imatch.save_results=true".

stscijgbot-jp commented 1 year ago

Comment by Jane Morrison on JIRA:

Howard Bushouse Mihai Cara 

The issue seems to be in mrs_imatch_step.py 

for im, poly in zip(models, bkg_poly_coef):     im.meta.background.subtracted = False    im.meta.background.polynomial_info.append(  {  'degree': degree,  'refpoint': center,  'coefficients': poly.ravel().tolist(),  'channel': channel  }  )

I have tried to write the information to im.meta.background.polynomial_info differently but python says it is defined as a ListNode. Not sure what a ListNode is in the core schema polynomial_info says it is a list. When the polynomial_info is filled in and mrs_imatch.save_results = True the output results in splitting the header two parts and second part is written as the extra SCI extension with no data just the second part of the header information. 

I am unclear what a ListNode is - can we just write the polynomial_information stuff as floats ? Is there a reason it needs to be written like is above ?

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Putting this on hold, at least temporarily, until the MIRI team determines whether the step is actually needed or useful. Right now it's being skipped by default, by a pars ref file.