spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
559 stars 165 forks source link

Outlier rejection algorithm for IFU mode needs updating #4616

Closed stscijgbot closed 3 years ago

stscijgbot commented 4 years ago

Issue JP-1313 was created by David Law:

Ongoing testing of outlier rejection for the MIRI MRS has uncovered a few issues.  The direct bugs are addressed in https://jira.stsci.edu/browse/JP-1287

However, testing also suggests more generally that the algorithm in use needs to be revised.

The as-implemented approach follows the imaging philosophy and constructs a cube from each input exposure, combines the cubes together via a median algorithm, blots the resulting cube back to the 2d detector pixels, and somehow looks for any pixels in which the data and blot differ by more than some value times the error extension.  This currently isn't detecting 300-sigma outliers that I'm introducing into the data though, and I'm unclear whether this is because much of the code centers around 'growing' the current cosmic ray mask or because of some other issue.

More generally though, this method of using the ERR extension to compare the SCI and BLOT data is likely to run into problems for the IFU data because of how undersampled the IFUs are.  Because the scene is not well sampled (factor of two or worse) the flux from an unresolved point source has strong pixel-phase dependence, meaning that the SCI and BLOT data can differ by much more than anticipated based on the formal ERR alone.

A workaround for this is to instead combine the individual data cubes using the biweight mean (robust to outliers for the low number of 4 exposures typical for most MRS cases) to compute both the mean and the scale (roughly speaking, the rms) of the composite cube.  By blotting back both the mean and the rms to the 2d detector data the blotted rms gives a better representation of the true variability expected from sampling phase effects, and can be used to detect and flag real outliers with less flagging of real point sources.

In a test branch of the pipeline, implementing the above algorithm successfully detects and flags artificial outliers introduced into the data and improves the quality of the resulting data cubes.  Optimization of the rejection parameters will still be needed on a mode-specific basis, informed by commissioning data.

stscijgbot commented 4 years ago

Comment by David Law: [~koekemoe] [~kgordon]

stscijgbot commented 4 years ago

Comment by Karl Gordon: The undersampling of the IFU data is an issue.  Is the thought that for the IFUs this issue is mainly spatial undersampling or is the spectral understampling an issue too?

Do I understand the proposed solution is to compute the rms from the cube stack directly and not use the propagated uncertainties?  This is a reasonable solution, but may have challenges when there is not many dithers.  For example, how will the algorithm work for only 2 dithers?

A similar issue as for the imaging data and the spatial undersampling of the PSF.  The solution for the imaging is to include an "acceptable" fractional deviation that is set by the pixel undersampling of the PSF.  This acceptable fractional deviation is straightforward to compute given knowledge of the PSF.  This acceptable fractional deviation would be used in addition to the propagated uncertainties to ensure that real deviations due the pixel sampling from the PSF are not flagged as outliers.  Could a similar acceptable fractional deviation be computed using the known spectral profile for an unresolved line?

stscijgbot commented 4 years ago

Comment by David Law: Spatial undersampling is known to be an issue (e.g., JWST-STScI-005587) driving the current dithering scheme.  Spectral undersampling is known to affect studies quantifying the spectral LSF, and I'd expect would similarly affect blotting of unresolved spectral features.  The detailed impact of dithering on spectral sampling is under investigation (https://jira.stsci.edu/browse/JWSTMIRI-56).

Regarding the RMS: the algorithm will not work to detect outliers for 2 dithers, but neither will any other algorithm.  When given two input points the biweight scale returns that both points are within '1sigma' of the midpoint.  Better redundancy against detector pixel artifacts is one of the motivations for 4-point dithers.

It may be possible to construct a different 'effective RMS' to use for outlier comparisons via different techniques.  One that I explored briefly was quadrature combination of ERR and blotted biweight scale, although this did not obviously improve performance relative to just using the blotted biweight scale.  An 'acceptable' fractional deviation (applied to the blotted science image?) may also be a reasonable way to approach this problem as that would contain similar information to the blotted biweight scale.  Simulations once [https://github.com/spacetelescope/jwst/pull/4547] is merged should be able to quantify this.

stscijgbot commented 4 years ago

Comment by David Law: Looking into this further now; it seems that there was some discussion at a Cal WG meeting as well (see comments on [https://outerspace.stsci.edu/pages/viewpage.action?spaceKey=JWSTCC&title=Vanilla+Outlier+Detection]) Specifically, the discussion of the motivation behind many of the drizzle-pac steps is useful to see.

I think one of the low-hanging problems with the current outlier rejection may be that the code appears to be setting the 'JUMP_DET' flag when outliers are detected. That doesn't seem like a useful flag to set; after all, many pixels likely have JUMP_DET set already when cosmic rays were detected in the ramp and rejected appropriately. Cube build in particular ignores the JUMP_DET flag as it is not informative at this stage, therefore by definition anything found by outlier rejection won't affect the final data products. [~jdavies] [~bushouse] any idea why this particular flag is selected by line 20 of outlier_detection.py ?

For test purposes I'm going to replace this with DO_NOT_USE and evaluate the performance.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: The setting of JUMP_DET is a holdover to the HST pipeline roots of this routine and is in the process of getting updated in JP-787. The changes will definitely include the setting of DO_NOT_USE, which will then get used in subsequent steps such as resample. It's an open question as to whether an informational flag similar to JUMP_DET should also get set for pixels deemed to be outliers by the outlier_detection step (as opposed to the jump step). A different/new flag value could be invented for this, but unfortunately we're already right at the 32 bit limit of defined flags. So if we want to create new ones we either need to bump the data type of the DQ array from uint32 to uint64, or we need to do some spring cleaning of the existing definitions, many of which don't ever actually get used by anyone or anything. The existing definitions can be seen at [https://github.com/spacetelescope/jwst/blob/master/jwst/datamodels/dqflags.py#L1]

stscijgbot commented 4 years ago

Comment by David Law: Thanks Howard: based on some new tests adding artificial outliers into simulated MRS point source data I think that we don't need any significant overhaul of this step at the present time.  [~koekemoe] this means no need to discuss this in the Cal WG.  The two changes required for the MRS are to use scl1=scl2=2.4 in the outlier rejection parameter file and to have outlier rejection set DO_NOT_USE insted of JUMP_DET.

[~cracraft] the new parameter values will need to be folded into our parameter reference file discussion, and ticket https://jira.stsci.edu/browse/JP-787 should be added to the MIRI ticket priorization list as this is the real issue at hand (and it is currently not tracked on the INS ticket prioritization page).

stscijgbot commented 4 years ago

Comment by David Law: The good news: the recent merge of [https://github.com/spacetelescope/jwst/pull/4726] means that outlier detection is now detecting and properly masking simulated outliers from the MRS data cubes.

The bad news: With just this change in place the actual performance of the pipeline is temporarily worse, with the outlier rejection routine flagging large parts of every point source for rejection.

This will be resolved by using scl1=scl2=2.4 as discussed above, but I'm not sure what the most rapid way to get this updated is.  In the longer term there are ongoing discussions about providing instrument parameter reference files, but these are perhaps a longer-lead item that don't get the master branch performing well by default quickly.

[~bushouse] a stopgap solution would be to add a single line of code into the MRS-specific part of outlier rejection that forces use of reasonable values.  I've coded and tested this quick fix (https://github.com/drlaw1558/jwst/tree/mrs_outlier), should I put in a PR for this while the parameter reference file discussion is ongoing?

stscijgbot commented 4 years ago

Comment by Howard Bushouse: In the absence of instrument-specific parameter reference files I think you're proposed 1-line change for MIRI MRS is the best and only way to go. Please submit a PR against the master branch. If someone happens to have a more elegant idea of how to do it, it may show up in the PR reviews.

stscijgbot commented 4 years ago

Comment by David Law: Done: https://github.com/spacetelescope/jwst/pull/4778

stscijgbot commented 3 years ago

Comment by Howard Bushouse: Given that an overhaul of CRDS parameter reference file handling has been completed and hence teams are now free to deliver param ref files for each step, any remaining tweaking of outlier_detection params for MIRI MRS (and NIRSpec IFU?) can now be done by creating and delivering a suitable param ref file. [~dlaw] do you want to tackle that? Or are you satisfied with the other updates that have already been implemented?

stscijgbot commented 3 years ago

Comment by David Law: [~bushouse] I think it would be good to create a parameter reference file for this so that we can remove the hard-coded stopgap solution that we put in place last Spring.  I know [~cracraft] has been looking at this lately so we'll follow up.

Generally I think this ticket is now rather misleading; if you've no objection I'm going to close it and open a new ticket to remove the stopgap code that can be done once the reference file is available.

stscijgbot commented 3 years ago

Comment by Howard Bushouse: Yes, please resolve this ticket and create a new one for the param ref file work and code (un)update.

stscijgbot commented 3 years ago

Comment by David Law: Ok- closing this ticket.  The algorithm in use seems ok for now with the ability to identify 'acceptable' deviation  fractions.  Any further issues will be in new, more specific tickets.