spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
564 stars 167 forks source link

Combine S200A1 and S200A2 spectra in level 3 products #7701

Closed stscijgbot-jp closed 1 year ago

stscijgbot-jp commented 1 year ago

Issue JP-3233 was created on JIRA by Jeff Valenti:

{}Request{}: For fixed slit spectroscopy with NIRSpec H gratings, enhance the association generator so that spec3 will combine exposures with the target in the S200A1 and S200A2 slits. From a data quality perspective, this will double the number of exposures being combined, substantially improving outlier rejection. From a usability perspective, the user will get one product with complete wavelength coverage, rather than two products each with a detector gap in a different wavelength range. Depending on the implementation, spec2 may be changed to encode in SOURCEID the slit containing the target and the slit being extracted. This would help users identify which level 3 products contain the target and may simplify association logic at level 3.

{}Context{}: For fixed slit spectroscopy with NIRSpec H gratings, JDox recommends taking exposures with the target in S200A1 and S200A2, if the observer wants complete wavelength coverage. APT provides an "S200A1 and S200A2" option for this purpose.

stscijgbot-jp commented 1 year ago

Comment by Jeff Valenti on JIRA:

Observation 3 of program 2288 is a good test case. There are 24 exposures: 3 gratings (G140H, G235H, G395H) x 2 slits (S200A1, S200A2) x 2 dither positions along the slit x 2 detectors (NRS1, NRS2). The target is an L7 brown dwarf with molecular bands and lines. The data are public.

Three attached figures ([^g140h.pdf], [^g235h.pdf], [^g395h.pdf]) illustrate current level 3 products for the science target for slit S200A1 (top panel) and slit S200A2 (middle panel). The bottom panel shows the new level 3 product for the science target that combines S200A1 and S200A2. In the new association product, the detector gap is filled and there are fewer outliers.

Calwebb_spec3 generated new level 3 products (g140h_s00001_x1d.fits, g235h_s00001_x1d.fits, g395h_s00001_x1d.fits) as specified in hand-crafted association files ([^g140h_asn.json], [^g235h_asn.json], [^g395h_asn.json]). These association files assume all exposures of the science target have the same SOURCEID (see below). This test used CAL_VER=1.10.2, CRDS_CTX=jwst_1077.pmap, and PRD_VER=PRDOPSSOC-060 to process uncal data produced by SDP_VER=2022_5c. Log files ([^g140h_asn.log], [^g235h_asn.log], [^g395h_asn.log]) are attached.

Currently, calwebb_spec3 creates filenames that include the name of the slit containing the science target (s200a1 or s200a2). Slit name should be dropped from the filename, once calwebb_spec3 produces level 3 products that combine exposures from both slits.

{}Suggested implementation{}: Currently, calwebb_spec2 sets SOURCEID in extension headers based on the slit being extracted, not the slit containing the science target. Thus, SOURCEID=1 when the science target is in S200A1 and SOURCEID=2 when the target is in S200A2. The complicates association logic and is confusing for archive users. The attached python code ([^fix_source.py]) modifies SOURCEID in members needed by the new association. The suggested SOURCEID nomenclature is:

For example, when the science target is in S200A2 (SLITNUM=2), the extracted spectrum for S200A1 (SLITNUM=1) would be labelled source 21, instead of source 1. See [^fix_source.txt] for an explicit list of all SOURCEID changes for this test case. The main virtue of this approach is that SOURCEID maps to the same sky region for point sources and overlapping sky regions for extended sources. Currently, SOURCEID=1 for S200A1 is not the same sky region as SOURCEID=1 for S200A2.

{}Additional suggestion #⁠1{}: In x1d files produced by calwebb_spec3, SLTNAME in the EXTRACT1D extension header should document the slit being extracted, not the slit containing the science target. Currently, the slit containing the science target is in FXD_SLIT in the primary header and in SLTNAME in the EXTRACT1D extension header. The name of the slit being extracted is not documented in x1d files produced by calwebb_spec3.

{}Additional suggestion #⁠2{}: In MAST, recommended products should only include extracted spectra of the science target (SOURCEID=1 in the suggested approach above), which is what the vast majority of science users want. Do not include extracted spectra for apertures that do not contain the science target. Currently at level 3, there are six x1d files that contain the science target (two per grating) and 24 that do not. With the suggested implementation above, there would be three x1d files that contain the science target (one per grating) and 24 that do not. 

stscijgbot-jp commented 1 year ago

Comment by Jeff Valenti on JIRA:

James Muzerolle, Anton Koekemoer, Alicia Canipe, Jonathan Eisenhamer, Rosa Diaz - Added you as watchers. Added link to this ticket from Status of Associations and Level 3 Data Products page.

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

Thanks Jeff for the clear and detailed description, I've also added Greg Sloan as well as Cheryl Pavlovsky and Stephan Birkmann  (the two current NIRSpec reps on the CalWG).

There are indeed a few pieces to this, including whether or not any changes might be needed on the algorithm side for combining the data from the two slits (which we can arrange to discuss in the CalWG if needed), also possible impacts on header keywords or other metadata (which may flow onto DMSWG), as well as updating the ASN generator, along with whether changes are needed on the MAST side. 

To start things off, could the NIRSpec team indicate please their perspective on the level of criticality for this, by setting the "INS Team Criticality" label on this ticket to the appropriate level (nrs_low/ med/ high/ critical), along with their thoughts on the pipeline code/ algorithm side, in terms of whether or not any changes might be needed? That would be helpful for getting things started on the CalWG side at least, in terms of planning the next steps.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

What kind of linkage, if any, is provided in the APT when an observer uses this recommended strategy of observing the same target in both the S200A and S200A2 slits? Are they at least assigned the same association candidate ID?  This information is needed in order to modify the association generator rules so that they properly recognize and associate the S200A1 exposures with the S200A2 exposures.

stscijgbot-jp commented 1 year ago

Comment by Jeff Valenti on JIRA:

Howard Bushouse - When the observer selects the Slit = "S200A1 and S200A2" in APT, the resulting exposures are all part of the same observation. In the example above, the exposures that should be combined are all part of observation 3 and hence members of association_candidate_id "o003". For additional context, see the attached APT screenshot:

!apt_2288.png|thumbnail! 

The challenge here is that calwebb_spec2 produces 2D/1D spectra for up to five NIRSpec fixed slits, labeling the results by a source number (e.g., s0002) that seems to depend on slit name, rather than region on the sky. One of the sources is the target and the remaining sources are typically empty sky. Source numbering is consistent for dithers in a particular slit, but not for pointings in different slits. In the example above, the science target is s0001 when the target is in S200A1 and s0002 when the target is in S200A2. As demonstrated above, including exposures with different fixed slits as members of the same association will do the right thing, if the science target has a consistent source number for both slits. Other implementations are possible.

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

Cheryl Pavlovsky  Stephan Birkmann  James Muzerolle  would you be able to assign this ticket please an "Impact" and "Urgency" rating (1 though 4) and associated text in the "Criticality Rationale" field?

stscijgbot-jp commented 1 year ago

Comment by Elena Manjavacas on JIRA:

As pipeline lead for the NIRSpec FS mode, I will be also looking into this issue. I might ask questions to some of you.

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

Thanks Elena Manjavacas! This ticket was also discussed at our recent JWST Cal WG Meeting 2023-08-01, let me know please if you're not able to access that, and you can also touch base with Kenneth MacDonald since Howard Bushouse  reported that he's the one who is working on the necessary ASN rule updates.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Jeff Valenti I've been doing some thinking and exploring regarding your proposed handling of SLTNAME vs. SOURCEID in the individual and combined (level 3) products. I like the proposed change for assigning SOURCEID, which for all the non-primary slits amounts to encoding the primary slit number in the first digit of the SOURCEID and the individual secondary slit number in the second digit of the ID, e.g.

SOURCEID = 13 indicates S200A1 was the primary slit and S400A1 is the slit from which the data were obtained

SOURCEID = 35 indicates S400A1 was primary, S200B1 is the slit for these data

So for the case where the primary target was placed in both the S200A1 and S200A2 slits within a single observation, the background data from slit S400A1, for example, would end up with SOURCEID's of 13 and 23. Meanwhile both sets of data for the primary target would have the same SOURCEID=1 (always a single digit).

Now on to the full names for combined level 3 products. After making the change in the ASN rules to combine the data from the S200A1 and S200A2 slits into a single product, the resulting names for all the level 3 products would be of the form:

 jw02288-o003_s000xx_nirspecf170lp-g235h-s200a1-s200a2.fits

where the primary target has a source_id field of "s00001" (always source 1), and the data from the other slits have source_id fields of "s00013", "s00014", "s00015", "s00023", "s00024", and "s00025".

For the primary target, with source_id="s00001", the inclusion of both slit names ("s200a1-s200a2") seems appropriate, because it indicates that the data in the combined product came from both of those slits (that's the way we always construct the list of optical elements in the combined product name, by including all elements that contributed to the product).

But for all the non-primary slits it strikes me as strange and confusing to have the primary slit name appear at all in the level 3 product names, even in the old (current) scheme. Sources 3, 4, and 5 (in the old scheme) or 13, 14, 15, 23, 24, 25 (in the new scheme) do not have anything to do with either the s200a1 or s200a2 slits. The data for those sources come from slits s400a1, s1600a1, and s200b1. So perhaps we should drop the slit name from combined level 3 products for NRS FS mode all together, even for cases that don't involve the primary target being in both s200a1 and s200a2?  Of course we'd need to check with archive/CAOM/MAST folks to see how this kind of change might affect their subsystems.

stscijgbot-jp commented 1 year ago

Comment by Jeff Valenti on JIRA:

Howard Bushouse - For the science target only, I agree that the filename should include the names of both slits that were combined, i.e.:

jw02288-o003_s00001_nirspec_f170lp-g235h-s200a1-s200a2_<product_type>.fits

No other products (and hence filenames) should combine data from multiple slits. No other products (and hence filenames) should combine data obtained with the science target in different slits. There is no overlap in the patches of sky covered by the background slits, when the science target is in different slits.

There will be products that combine dithers within each background slit, when the science target in one of the apertures. I think the filenames should be of the form:

jw02288-o003_s00012_nirspec_f170lp-g235h-s200a2_<product_type>.fits
jw02288-o003_s00013_nirspec_f170lp-g235h-s400a1_<product_type>.fits
jw02288-o003_s00014_nirspec_f170lp-g235h-s1600a1_<product_type>.fits
jw02288-o003_s00015_nirspec_f170lp-g235h-s200b1_<product_type>.fits
jw02288-o003_s00021_nirspec_f170lp-g235h-s200a1_<product_type>.fits
jw02288-o003_s00023_nirspec_f170lp-g235h-s400a1_<product_type>.fits
jw02288-o003_s00024_nirspec_f170lp-g235h-s1600a1_<product_type>.fits
jw02288-o003_s00025_nirspec_f170lp-g235h-s200b1_<product_type>.fits

The filenames above for data from non-primary slits include the name of the slit being extracted (corresponding to the last digit of the source number), but not the name of the primary slit containing the science target. I agree with you that the file name should contain the name of the slit(s) being extracted, not the name of the slit containing the source. 

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

I've implemented a hack to the Spec3Pipeline routine that builds output file names for each of the source-based products for NIRSpec fixed-slit mode that I believe does what you're asking for. For example, the list of x1d products is now:


jw02288-o003_s00001_nirspec_f100lp-g140h-s200a1-s200a2_x1d.fits
jw02288-o003_s00021_nirspec_f100lp-g140h-s200a1_x1d.fits
jw02288-o003_s00012_nirspec_f100lp-g140h-s200a2_x1d.fits
jw02288-o003_s00013_nirspec_f100lp-g140h-s400a1_x1d.fits
jw02288-o003_s00023_nirspec_f100lp-g140h-s400a1_x1d.fits
jw02288-o003_s00014_nirspec_f100lp-g140h-s1600a1_x1d.fits
jw02288-o003_s00024_nirspec_f100lp-g140h-s1600a1_x1d.fits
jw02288-o003_s00015_nirspec_f100lp-g140h-s200b1_x1d.fits
jw02288-o003_s00025_nirspec_f100lp-g140h-s200b1_x1d.fits ```
The file names for all the other level-3 products are the same, with just a different product type suffix (cal, crf, s2d). Note that files 2 and 3 in that list are for data from the s200a1 and s200a2 slits when they do NOT have the target in them (i.e. background sky). Note too that the only time we get a single digit source_id is for the source that's in the primary slit, which is always now source_id=1. All other source_id's are double-digit of some kind, with the first digit corresponding to the primary slit in use and the second digit corresponding to the slit from which the data were obtained. So you'll never have source_id's of 11, 22, 33, 44, or 55, because those would all be cases of the data being extracted from the primary slit and hence they would always have source_id=1.

With these changes the source_id and slit_name fields in the output products are a bit redundant with one another, but (IMHO) I think it's OK to leave them both in the product name.

What do others, especially the NIRSpec fixed-slit aficionados, think of this scheme? 
stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Note that this change to the way we create level-3 product names for NRS fixed-slit observations means that all level-3 NRS fixed-slit product names will change, such that they will no longer just contain the name of the primary slit, but rather contain the name of the particular slit from which the data were extracted. Hence there are going to be many dead/stale old product names in the archive that'll be superseded (but not overwritten) by the new naming scheme.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

James Muzerolle Elena Manjavacas Stephan Birkmann as NIRSpec reps, what are your opinions regarding the proposed change in product naming for all fixed-slit observations/exposures?  Briefly, the current FS products all have the name of the primary slit in them, regardless of which slit data were extracted from. For example, for an exposure that uses S400A1 as the slit containing the target, but a large enough subarray is used so that data exists for all 5 slits, the Level 3 product names currently look like:


jw02288-o003_s00001_nirspec_f170lp-g235h-s400a1_x1d.fits
jw02288-o003_s00002_nirspec_f170lp-g235h-s400a1_x1d.fits
jw02288-o003_s00003_nirspec_f170lp-g235h-s400a1_x1d.fits
jw02288-o003_s00004_nirspec_f170lp-g235h-s400a1_x1d.fits
jw02288-o003_s00005_nirspec_f170lp-g235h-s400a1_x1d.fits```
such that the only way to know which source/slit is which is by knowing the hardwired correlation between source_id and slits, which is:
```java
s00001 = source in S200A1
s00002 = source in S200A2
s00003 = source in S400A1
s00004 = source in S1600A1
s00005 = source in S200B1```
In the new scheme proposed here, the Level 3 product names would look like (again assuming the target is in S400A1):
```java
jw02288-o003_s00031_nirspec_f170lp-g235h-s200a1_x1d.fits
jw02288-o003_s00032_nirspec_f170lp-g235h-s200a2_x1d.fits
jw02288-o003_s00001_nirspec_f170lp-g235h-s400a1_x1d.fits
jw02288-o003_s00034_nirspec_f170lp-g235h-s1600a1_x1d.fits
jw02288-o003_s00035_nirspec_f170lp-g235h-s200b1_x1d.fits ```
such that each product name clearly indicates which slit the data were extracted from, and the primary target always has source_id = 00001 (the one in the S400A1 slit).
stscijgbot-jp commented 1 year ago

Comment by Elena Manjavacas on JIRA:

I approve this fix as NIRSpec FS mode lead.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Fixed by #7879

stscijgbot-jp commented 1 year ago

Comment by Emily Wislowski on JIRA:

I was doing some testing and noticed that in the combined S200A1+S200A2_cal.fits files, the FXD_SLIT header keyword was still only 'S200A1'. My guess is that it's probably like that intentionally, but just wanted to double check. Could someone confirm?

Other than that, everything tested (the updated header keywords, L3 product naming, and combination of L3 S200A1+S200A2 products) looked good / worked as expected.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Emily Wislowski Yes, it's difficult, or in some cases impossible, to come up with reasonable values for keywords in combined products where the inputs have multiple values. For some numerical quantities, like exposure times, it's simple enough to just take the sum of the inputs and use that to populate the output combined value, but for keywords like this that indicate individual names of items there often isn't a good way to indicate a combined value (e.g. "S200A1+S200A2" would not be an allowed value for the keyword). So in those cases it just defaults to the value coming from the first input in the list of combined datasets. The HDRTAB table extension that appears in all combined products contains a list of all keyword values from all inputs, hence there is some traceability there.

stscijgbot-jp commented 1 year ago

Comment by Emily Wislowski on JIRA:

Howard Bushouse Great, then everything is tested and working as expected from the NIRSpec side