spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
555 stars 161 forks source link

MIRI LRS spectral extraction: improved automated aperture placement #7719

Closed stscijgbot-jp closed 9 months ago

stscijgbot-jp commented 1 year ago

Issue JP-3021 was created on JIRA by Sarah Kendrew:

The extract1d() step currently uses coordinate information to place the extraction aperture in the array. We have found this to be very error prone - due to the small size of the slit, even small errors in coordinate registration (which are pretty common) can cause the aperture to be placed in the wrong position. This can be fixed by re-running the extract1d() step, but it means the automated products in MAST are often of poor quality. For our team internally, it slows down our calibration work if we always have to re-run this step manually with custom parameters. 

The placement of the target is however generally very accurate, and we feel that a default strategy of placing the aperture at the nominal positions would overall yield better results than the current approach. There would still be cases where manual re-extraction is required, but overall the success rate would be much higher.

I have attached a flow diagram outlining the different use cases for LRS and happy to discuss further. 

 

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Given that the use of the ALONG_SLIT_NOD dither pattern for MIRI LRS slit always results in the target being at the same location for each value of PATT_NUM (1 or 2), this could be addressed by simply delivering separate extract1d reference files to CRDS for each PATT_NUM value and setting appropriate xstart/xstop values for each nod position. That avoids hardwiring the extraction positions into the pipeline code itself. This would of course require updates to the selection criteria for extract1d ref files for MIRI.

stscijgbot-jp commented 1 year ago

Comment by Sarah Kendrew on JIRA:

thanks Howard Bushouse I had typed a long comment on this ticket before the break, then forgot to post it and now it's gone. But yes, this was my proposed solution as well. The only issue I see with it is that the coordinates are different depending on whether we extract from the s2d or the cal product, and in stage 2 or stage 3. We can probably just cater for the rectified product (ie the s2d file), which is the default. So we should at minimum provide 3 new json files (nod 1, nod 2, and L3 merged product), do you agree?

For slitless we don't run resample_spec so the coordinates should be the same for both executions of extract_1d. 

 

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

I agree that selecting different ref files based on whether the product is a cal or s2d file would be tricky and hence I also agree that just sticking with values that are appropriate for the s2d is the best bet, at least for now. Would have to give some thought to what selection criteria to use to distinguish a Stage 2 s2d from a Stage 3 s2d. Will take a look at a Stage 3 s2d header to see what, if anything, unique is in there that could be used as a selector.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

It looks to me like the only obvious way for the reference file selectors for the extract_1d step to tell when it's been given a combined s2d image as input during level-3 processing (as opposed to a single resampled s2d image during level-2b) is by keying off the value of the NDRIZ keyword in the header of the file. When it doesn't exist or has a value < 2, it's a single image, while values >=2 are combined and should result in a different extract1d ref file being selected (one that puts the extraction aperture at the center of the image).

Of course the CRDS pre-fetch that's done at the time the calwebb_spec3 pipeline is instantiated won't get the right selection, because the inputs at that time are still single, unresampled images. But at the time the extract_1d step is executed in the calwebb_spec3 pipeline, another call is made to CRDS to select an appropriate extract1d ref file and that fetch will use the meta data from the combined/resampled image that's being used as input to the step, so at that point it should select the proper ref file to be used.

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

What puzzles me is that at the spec3 stage, when extracting from a rectified spectral image (s2d file), the image looks fine, with a positive spectrum right down the middle and two half-strength negatives to either side. And yet, the extraction is still wrong. So the s2d image is built correctly, and yet we're not extracting properly.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Greg Sloan When you say "the extraction is still wrong", what exactly is wrong with it?  Dropouts or spikes from weird pixel values included in the extraction aperture, incorrect placement of the extraction aperture causing some of the background (negative) traces to be included, or what?

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

Howard Bushouse, good question. My bad for not providing documentation.

!hd180609_pid1536_obs130.png|thumbnail!

Above: What we got from the automated extraction (in red), vs. what we expect (in blue).

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

Sarah Kendrew produced the following extraction manually from the same data:

!sp_hd180609.png|thumbnail!

Other standard stars observed by the LRS produce similar results. The detailed differences at longer wavelength (i.e. the red excess in the extraction) are probably not related to this issue, and we are investigating.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

So now we just need to know what Sarah Kendrew did to manually to produce that better extraction. Was it just forcing extract_1d to place the extraction aperture at a different location than what the automated processing used? If so, then the "wrongness" of the automated results is just that it picked up signal from the wrong parts of the combined/resampled image, instead of being nicely centered on the positive trace and completely excluding the negative traces.

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

Yes, Sarah Kendrew just forced the extraction to specific columns. I'll let her weigh in with her thinking on what is leading the pipeline to pick the wrong aperture.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

I'm guessing it's just the fact that it's trying to use the target RA/Dec to compute an appropriate x/y location in the image (i.e. the "use_source_posn" option) and the RA/Dec values are just off a bit due to inaccuracies in pointing (or older data that haven't had the benefit of updates to the WCS transforms).

stscijgbot-jp commented 1 year ago

Comment by Sarah Kendrew on JIRA:

My extraction method was to take the L3 resampled, dither-combined data from MAST (the s2d file; i.e. everything default up to there). Then I performed the spectral extraction on that with a custom file, contents copied below, and the use_source_posn set to False. Very simple, 7-px aperture, forced at the xstart, stop coordinates. Agree with Howard Bushouse on the likely cause - though it would be easy enough to remove the override, run the L3 extraction again, and look at the output information to see what offset it is trying to apply. 

 

{     "reftype": "EXTRACT1D",     "instrument": "MIRI",     "telescope": "JWST",     "exp_type": "MIR_LRS-FIXEDSLIT|MIR_LRS-SLITLESS",     "pedigree": "GROUND",     "descrip": "MIRI LRS extraction params for ground testing",     "author": "H.Bushouse",     "history": "Second draft 2016-Nov-11",     "useafter": "2001-01-01T00:00:00",     "apertures": [

        {         "id": "MIR_LRS-FIXEDSLIT",         "region_type": "target",         "bkg_order": 0,         "dispaxis": 2,         "xstart": 27,         "xstop": 34

        },

        {         "id": "MIR_LRS-SLITLESS",         "region_type": "target",         "bkg_order": 0,         "dispaxis": 2,         "extract_width": 11,         "nod2_offset" : 0         }     ] }

stscijgbot-jp commented 1 year ago

Comment by Sarah Kendrew on JIRA:

I think the best solution right now, for the sake of progress, is to make a new json file with extraction parameters that are applicable to L3 slit and slitless products, and get that ingested into CRDS for default pipeline operation - superseding entirely the one that is currently there. Is it possible/difficult to set the use_source_posn parameter to False by default?  Is that in code, or does that require a parameter reference file update as well?

That would produce better quality L3 products. If we want better L2 products as well, I'd have to make 2 additional json files for L2, one for each nod position. Can the pipeline differentiate between the nods from the metadata? Can the extract_1d step pick different json files depending on whether it's a L2 or L3 run?

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Sarah Kendrew I just did some testing and have found that the updates included in the attached new extract1d ref file will do what you want for L3 (calwebb_spec3) data, where the source spectrum is centered in the s2d image. Note the addition of the "use_source_posn" setting in the ref file itself. Also, I've removed the obsolete "nod2_offset" params.

Unfortunately I discovered during testing that we (SCSB) need to make 2 small tweaks in the extract_1d step code itself to make this work properly. Right now it unfortunately ignores the setting of "use_source_posn" in the ref file and always sets it to True for LRS-FIXEDSLIT (as well as some other modes). So we need to fix that in the code to allow the ref file to control it. Even after those changes, a user could still override by specifying "use_source_posn=True" on the command line, because command line param values always take precedence over ref file values. Code fixes are being tracked in JP-3095.

The only thing that concerns me about installing this new extract1d ref file is the mess it's going to make for L2 extracted spectra. Those hardwired xstart/xstop values will cause extraction to happen where there's almost no signal at all in nodded L2 images. But to fix that will require updates to the extract1d selection criteria for MIRI LRS.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Whoops, discovered a minor formatting error in the updated extract1d ref file I attached yesterday. The value of the "use_source_posn" item should be given in ALL LOWERCASE letters (i.e. false instead of False), so that it's properly interpreted by json parsers as a boolean value and not a string value. I've deleted the bad file and attached a corrected one.

stscijgbot-jp commented 1 year ago

Comment by Sarah Kendrew on JIRA:

thanks Howard Bushouse this is great. I have been setting the use_source_posn in the step parameters, rather than in the reference file, and that seems to work. but it would be nice to be able to set that in the reference file, so that is a useful update. would you be able to also update the block in the json for slitless mode? here the situation is simpler as we don't dither, so the extraction region for both L2 and L3 can be the same. 

I would be fine to put this in for L3 extraction - we can work on sorting out L2 extraction. do you think we should create additional json files for L2 extraction?

if the delivery needs to come from the MIRI team, I can sort out the tickets for CRDS for that. 

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Sarah Kendrew I've attached a new updated extract1d reference file that has both slit and slitless sections updated. I also updated the meta data at the top. If you're OK with all that, you can go ahead and submit to CRDS. Of course you may need to warn people that when it gets applied during spec2 processing of fixed-slit exposures, it's going to completely miss the spectral trace and hence the exposure-level x1d products from calwebb_spec2 will be full of nothing but background signal. On the other hand, it may not be any worse than what's currently coming out of spec2, because of the problems with applying use_source_posn to the resampled data (it too misses the trace). But you/we really should work on getting extract1d ref files that work for exposure-level products too.

stscijgbot-jp commented 1 year ago

Comment by Sarah Kendrew on JIRA:

Howard Bushouse looks good - could we change the slitless block to also use xstart/xstop limits rather than the extraction width, just so it's consistent with what we have for slit? the peak column is 36 for slitless so it could be something like 32 to 40 (and adjusting for pixel counting convention; what does this file use?)

How much effort is it to update the extract_1d() code to take separate reference files for level 2 extraction and is there an elegant way of providing an aperture for each nod position, or would that require 2 additional json files?

 

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Sarah Kendrew The xstart/stop values in the extract1d json file are 0-indexed. So assuming that the slitless peak is centered at col 36 (also 0-indexed; I see it at 37 in ds9, which is 1-indexed), then xstart=32 and xstop=40 would center the aperture at 36 and use +-4 pixels on either side of that (the limits are inclusive), for a total width of 9 pixels. Does that sound good?

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

In terms of setting hardwired extraction locations for level-2 fixed-slit data is concerned, the only way it can be done from reference files alone (i.e. no code changes) is to expand the extract1d ref file selection criteria to include the keywords "NDRIZ" and "PATT_NUM". The NDRIZ value would distinguish between level-2 and level-3 data (NDRIZ=1 for level-2, NDRIZ=2 for level-3), and then the PATT_NUM values would distinguish between the two nod positions, which need to use different locations in level-2 products. Unfortunately it won't work to base the selection on PATT_NUM alone, because the meta data for the combined level-3 product is just a copy of the first input image, so it can be either 1 or 2, even though the image is actually composed of data from both. The only way to know that it's combined data is via NDRIZ>1.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

The ultimate solution of course is to implement an empirical trace finding routine that finds the correct center on its own, without the need for ref file info.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Sarah Kendrew Yesterday Greg Sloan mentioned to me that extraction apertures should always be a multiple of 4 columns wide, so that there's equal contributions from the amplifier-dependent noise in each column. Can you verify that? If so, the latest range we've setup for fixed-slit is OK (27-34, inclusive, so 8 columns), but the slitless range of 32-40, inclusive, is 9 columns and hence may need to be reduced by 1.

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

My mention of the need for extraction windows with widths in multiples of 4 is from Jeroen Bouwman. He noticed that the detector read-out in groups of 4 leads to more noise in extracted LRS slitless spectra when the extraction window is not 8 pixels wide. I would expect that the same is true for the slit, but am not so clear how well the effect has been measured.

stscijgbot-jp commented 1 year ago

Comment by Sarah Kendrew on JIRA:

I have not seen the data on that myself but I trust Jeroen & it sounds like a reasonable thing to do. I think perhaps best to increase the aperture for slitless to 12 rather than reduce to 8. 

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

I agree that 8 pix width is good for the slit. For slitless, we're not as constrained by the negative spectra, and so 12 pix would be fine.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Greg Sloan Sarah Kendrew I've attached yet another updated version of the LRS extract1d ref file, which now has the slitless aperture a total of 12 pixels in width, and the fixed slit remains at 8. So both are now multiples of 4, but of course that also means each of them has the trace off-center by 1 column (because it's an even number of columns).

This should work well for all slitless products, and for level-3 combined fixed-slit products. It won't work that great for individual level-2b fixed-slit images, but those results are not really useful for much anyway.

So if you're happy with this, I suggest you go ahead and submit to CRDS, at your convenience.

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

Created JIRA ticket JWST-MIRI-327 to submit JSON file to CRDS

stscijgbot-jp commented 1 year ago

Comment by Sarah Kendrew on JIRA:

Thanks all! 

stscijgbot-jp commented 1 year ago

Comment by Misty Cracraft on JIRA:

Checking on the status of this ticket. The extract_1d parameter reference file has been delivered and should now be in use in OPS for build 9.2. This will help level 3 files (combined), but it was noted that it will make a mess of level 2 data (single extractions). Is there more work on this ticket to help level 2 images, or do something more automated in the code to find the proper trace rather than rely on parameters being set in the parameter reference files? Just trying to figure out if there is ongoing work here, or if this is ready to be tested by INS? Or is it in some state where the first workaround is ready for testing, but there is still more work to be done?

stscijgbot-jp commented 1 year ago

Comment by Greg Sloan on JIRA:

For the spectral extractions at level 2, do we have a keyword that identifies if we're in the Nod 1 or Nod 2 position? I think this is a code change, but using that to reference the right reference file and providing reference files with extraction apertures for both would solve our problem. (Or providing different extensions of the same reference file could also work.)

The algorithm Misty is referring to is commonly called a source-finder, and we'll need such a step for PSF-based extractions. For the basic spectral extraction, we're not that sensitive to the precise position of the target, and Nod 1 vs. Nod 2 should be sufficient in most cases.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Greg Sloan Yes, there's the PATTNUM keyword that indicates which position of the dither/nod pattern is being used for a given exposure, so the level 2 files have values of 1 and 2. However, the level 3 combined files have a primary header that is populated by just copying all the meta data from the first exposure in the list being combined, so they too have PATTNUM and you never know if it'll be 1 or 2. Even if you did know which value it would always be set to for the combined images, it still makes it look like a single exposure at one position or the other. So it won't help in terms of selecting a proper reference file. The only indicator that I've found so far to differentiate between a single exposure and a combined exposure is the NDRIZ keyword (how many images have been drizzled together). This has been discussed above as a new selection criterion for LRS extract1d ref files.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

What's really needed here is for someone from INS and/or MIRI to just go ahead and formally and officially say "SCSB should implement a routine to find the center of the spectral trace", using some method like collapsing the image data along the dispersion direction and finding the resulting peak in the cross-dispersion direction. This has been mentioned in casual ways during many previous discussions, but has never made its way to being a formal request. So if someone would just create a ticket that says that and gets a priority assigned ... something may finally happen to actually do it.

stscijgbot-jp commented 1 year ago

Comment by Sarah Kendrew on JIRA:

That sounds like a good idea Howard Bushouse. I do think this particular ticket can be closed as we achieved the goal of producing much better level 3 extracted products in MAST. (I have checked a few reprocessed datasets using the MAST quicklook tools and confirm they look good!). 

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

Just following up on this - glad that it sounds like this ticket can be closed, so as the next step, can someone (Sarah Kendrew  or Greg Sloan ?) point us please to the new JP ticket (and create it, if it doesn't yet exist) which proposes the algorithm improvement summarized above? ie (quoting, but feel free to adjust the wording):

"implement a routine to find the center of the spectral trace", using some method like collapsing the image data along the dispersion direction and finding the resulting peak in the cross-dispersion direction

If you could post/ link that new JP ticket here please then this ticket can be closed. Thanks in advance!

stscijgbot-jp commented 11 months ago

Comment by Howard Bushouse on JIRA:

An update has been included in #7796 for the LRS fixed-slit mode, which places the extract_1d aperture at the known dither offsets used for each nod of an LRS observation.

stscijgbot-jp commented 9 months ago

Comment by Misty Cracraft on JIRA:

Is this still an 'In progress' ticket, or should it be set to 'Ready for Testing'? Has the requested ticket been created so this ticket can be closed? Greg Sloan  Sarah Kendrew 

stscijgbot-jp commented 9 months ago

Comment by Sarah Kendrew on JIRA:

Misty Cracraft I believe the core issue of the ticket was resolved and an additional improvement was made in pull request 7796 linked by Howard Bushouse above (comment 19 sep 23). So this ticket can be closed. I reported on the test for the additional improvement in ticket JP-3244 today.  There are other potential source-locating schemes and extraction improvements possible, but I think those should then require a separate ticket & discussion and that is for Greg Sloan to decide.

stscijgbot-jp commented 9 months ago

Comment by Howard Bushouse on JIRA:

Agreed. This ticket resulted in the changes to place the extraction apertures at the known nod positions within an LRS slit observation. The additional ticket referred to in a few comments was JP-3244, which requested the implementation of a routine to empirically find the center of the actual spectral trace. And that too has now been implemented.