spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
557 stars 164 forks source link

Implement IVM weighting in resample step #800

Closed larrybradley closed 3 years ago

larrybradley commented 7 years ago

This is a reminder that IVM (inverse variance map) weighting is needed to produce photometric errors in the source catalog step. Currently the wht_type 'IVM' option is disabled:

https://github.com/STScI-JWST/jwst/blob/master/jwst/resample/resample_spec.py#L506

hbushouse commented 6 years ago

B7.2 updates to ramp_fit calculation of error and variance values should (theoretically) allow for use of what will now hopefully be useful ERR values for use in IVM weighting. Related to updates being made to outlier_detection to use ERR values in noise model.

stscijgbot-jp commented 1 year ago

This issue is tracked on JIRA as JP-1657.

stscijgbot-jp commented 1 year ago

Comment by James Davies [X] on JIRA:

Adding a link to JP-1737 as it depends on how we handle weighting and errors in resample.

stscijgbot-jp commented 1 year ago

Comment by James Davies [X] on JIRA:

I'm working on this ticket right now, using inverse variance weighting for resampling multiple images/spectra together.

Currently, we just weight by exposure time of each input image, but of course that gives the same weight for almost every pixel in an input image, assuming there's not a DO_NOT_USE DQ flagging that pixel.

Based on the discussion in DMSWG notes and the needs for error arrays in source_catalog for imaging (see https://jira.stsci.edu/browse/JP-1611), the weights for each input image should be 1 / var_rnoise where var_rnoise is the read noise variance from ramp fitting. I.e. we should ignore the flat field and poisson variance when weighting. Correct?

This will create an output weight map WHT that could be used in source_catalog. But there's discussion in the DMSWG notes that the poisson variance should also be added to the mix. Any equation for this? And what about flat field variance?

Does it make sense to populate the ERR array with this new, massaged IVM-derived error array so it is of general use and source_catalog can just use it out of the box?

In drizzlepac there are no input variances, so it is done like the following:

https://github.com/spacetelescope/drizzlepac/blob/1b9b72dcea89cdf39ae1ea6b038da9313b743bbe/drizzlepac/imageObject.py#L762-L773

But of course this is not necessarily optimized for multi-read detectors (same for ACS and WFC3), and each input image does not have separate variances to use from ramp fitting. The input variances are being guestimated from input reffiles.

I would appreciate some input on this, particularly from Karl Gordon, Anton Koekemoer or anyone else who's interested in using error arrays in resampled imaging or spectral data.

I've added all the interested watchers on error arrays in imaging and spectral modes to this ticket, to get some feedback.

stscijgbot-jp commented 1 year ago

Comment by Karl Gordon on JIRA:

Correct - ignore the photon and flat noise.  Only read noise.  FYI, read noise is a good "proxy" for the true exposure time of a ramp (including the impact of ramp jumps) hence this is why it is a good weight here.

Generating a noise image that is appropriate for use in the source detection is separate as you say.  I feel like we may have discussed this already, but am not sure.  Maybe a weighted average of the flatfield, photonnoise, and readnoise variances where the weight is the same as the surface brightness image making 3 variance mosaics?  And these could be summed to get a total variance?  Michael Regan I feel like we have discussed this so - what are your thoughts?

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

Thanks for the update. It's correct that that the inverse variance would now come from the "readnoise" (especially as per the notes from Cal WG meeting 2021-01-05), and not include Poisson terms from objects in the exposures, at least not for weighting during drizzling.

One item that would be helpful to clarify is which (non-Poisson) error terms are included in the "readnoise" array (eg you mention flatfield), I think Michael Regan  mentioned that it includes other error terms beyond just the readnoise (eg, the total rms associated with the up-the-ramp fit), but it would be helpful for him or Karl Gordon  to confirm here.

[Edit - as Karl Gordon  now confirmed above, the "readnoise" array is effectively a good proxy for overall (non-Poisson) noise as it includes the impact of ramp jumps.]

In terms of Poisson noise from objects in the exposures, indeed that is not used in the weighting when drizzling – doing so can introduce biases (since the counts vary in each exposure), which is one of the reasons why using the "ERR" arrays in HST is generally discouraged in astrodrizzle ("We advise extreme caution when selecting the 'ERR' option, since the nature of this weighting scheme can introduce photometric discrepancies in sharp unresolved sources" as per https://drizzlepac.readthedocs.io/en/latest/astrodrizzle.html]).

So the general idea is to produce the drizzled images using weight arrays that capture all the other error terms except for the Poisson noise from objects in the exposures, then have the cataloging include the Poisson noise as part of its error calculation for each source.

The algorithm on the cataloging side, to then calculate the total error, is relatively straightforward (as previously discussed with Larry Bradley) - for each object, it would start with the aperture that was used to calculate the object's flux from the drizzled science image, apply that aperture to the drizzled weight image (which is on the same pixel scale), and calculate the sum of the two variance terms (from the non-Poisson weight image, and from the Poisson terms) to arrive at the overall variance, hence derive the rms error on the catalog flux measurement that was calculated in the same aperture.

I will note here though that, in order to enable the catalog step to include Poisson noise, it would be helpful to discuss how to pass information to it about  about the Poisson noise in each output drizzled pixel (eg, with HST data we'd use the drizzled image which is generally in units of electrons/s, and multiply that by an exposure weighting map which can take into account exposure time lost due to CR hits, saturation or other effects that can impact the up-the-ramp samples) – this does not need to delay implementing the IVM weighting in this ticket, but would be discussed rather as part of ticket JP-1611

 

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

In the interest of full disclosure and because Anton Koekemoer had a question about what all goes into the "var_rnoise" array as JWST exposures go through the pipeline, it does not include any other variances determined downstream (e.g. flatfield variance). It simply gets scaled appropriately as it goes through each subsequent step, such as getting divided by the square of the flatfield correction values, and the square of the pathloss correction values, etc., in order to keep it "in sync" with the SCI values as they get various corrections applied.

stscijgbot-jp commented 1 year ago

Comment by Kevin Volk on JIRA:

Is it correct to use the "read noise" if one estimates the (root mean square?) variance about the ramp fit as an error term?  Or an I misunderstanding the discussion above about the error terms?  One would think that the read noise contribution is included in the noise about the linear fit.

Since a "read noise" reference file was requested in the case of NIRISS we used the estimate of the read noise as a single value over the while array, taken from the work by Chris Willott analyzing the noise properties of the NIRISS CV3 dark ramps.  This value is not total noise, just a read noise sub-component, although it is the largest contribution to the estimated noise early in the ramp when the dark signal is small.  It seems that this should be changed to be the total noise term given the discussion about jump detection.

stscijgbot-jp commented 1 year ago

Comment by Karl Gordon on JIRA:

The read noise term in this case is the uncertainty on the ramp fit using the read noise reference file to get the value of the read noise for each frame.  Hence the read noise term should be seen as the component of the ramp fit uncertainty that just includes impact of read noise.  Thus, the full ramp fit uncertainty is unc^2 = readnoiseunc^2 + photonnoise^2.  As the readnoiseunc term scales with the number of frames included in the fit, if the fit was broken by a CR, etc. it more accurately reflects the "effective" exposure time.  

As already stated by Anton Koekemoer , using the photon noise is a known issue that imposes a systematic bias in any mosaicking weighting scheme.  As such, we can't use the variance of the ramp fit as it will include both read and photon noise.  In addition, it is not a good estimate of the ramp fit uncertainty due to the correlated photon noise making the variance a poor indication of the ramp fit uncertainty.

The ramp fit readnoise and photonnoise uncertainties are calculated based on a noise model that explicitly accounts for the uncorrelated read and correlated photon noise behaviors.

stscijgbot-jp commented 1 year ago

Comment by Kevin Volk on JIRA:

Thank you for the clarification, Karl.  That makes it all clear now.

stscijgbot-jp commented 1 year ago

Comment by James Davies [X] on JIRA:

Thanks Karl and others. This is helpful.

A further question. Is there any scientific benefit to keeping exptime scaling available as an alternate weighting method in resample?

My thoughts are to remove it. As it doesn't allow the computation of an error array, and it is technically the wrong thing to do for multi-read, up-the-ramp data.

Removing it would also make the code easier to maintain. Use a single weighting scheme that is correct for JWST data.

Thoughts?

stscijgbot-jp commented 1 year ago

Comment by Karl Gordon on JIRA:

I'm for removing it for the reasons you list.

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

There are benefits to keeping exptime weighting (even if we default to IVM in the operational pipeline) since there are use cases that benefit from generating these offline, if not in the operational pipeline (and yes, ideally we'd account for pixels that don't have the full up-the-ramp exposure time, but this can be lower priority).

But also could James Davies [X] please confirm that IVM will be calculated using not only VAR_RNOISE, but also including the variance from sky noise, ie in the same way as the current algorithms do on-the-fly in drizzlepac, which I thought was the plan to adopt here? (note that drizzlepac also adds dark current, which is perhaps negligible for our instruments here, but sky level is likely to be dominant in many cases). I ask because it seems to me that VAR_RNOISE excludes any Poisson terms at all, including those from background sky (but let me know if this is not the case).

stscijgbot-jp commented 1 year ago

Comment by James Davies [X] on JIRA:

Thanks for the input Karl and Anton.

Anton, when you say there are uses for exposure time maps offline, do you mean there's use for combing data by exposure time weighting, or that having an exposure time map would be useful, as those are (or could be) two separate things.

1) Should we be weighting by exptime for science data?

2) Should we produce some output that gives the rough exposure time of the output resampled image or mosaic?

I think #⁠2 above can be done without actually doing #⁠1, as all one needs is the context image (CON) which we already produce, the exposure time for each image referenced in the CON "layer" and the resampled DQ mask that each corresponds to.

stscijgbot-jp commented 1 year ago

Comment by James Davies [X] on JIRA:

As for implementation, I see the current plan as having 3 parts:

1) Do the resampling by weighting via inverse var_rnoise, and writing out that raw weighting to the WHT extension.

2) Create an ERR extension based on the WHT from #⁠1 above and resampled poisson and flat variances. I'm not clear on how these get combined or scaled relative to one another. Note that the WHT will not be normalized. An equation would be helpful here.

3) Have source_catalog and extract_1d to use this ERR array. This final step is a separate ticket (JP-1611), and perhaps that ticket also encompasses #⁠2 above. Regardless, #⁠3 is dependent on doing #⁠1 and #⁠2 correctly.

I already have a PR for #⁠1 above, this discussion here will frame what is to be done in #⁠2 I think. I hope.

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

Thanks for the responses, James.

Regarding exposure time, would it be possible to leave that as is for now? Just to make sure we get the IVM implemented at this point.

When you refer to creating an ERR extension, are you referring to the level 3 products? (since I think the level 2 products are intended to remain as is).

Following drizzlepac, the IVM would be created using VAR_RNOISE, also combining sky level (and dark current, though this is less dominant) – ie, using just VAR_RNOISE to calculate IVM wouldn't give what is needed here (is this what you meant by needing the equation?).

I'd happy to schedule a meeting to perhaps clarify some of various items for all of us, if that would help.

Thanks again!

stscijgbot-jp commented 1 year ago

Comment by Anton Koekemoer on JIRA:

Scheduled this issue for discussion at JWST Cal WG Meeting 2021-03-02

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Fixed by #5738

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

Alicia Canipe assigning to you to find a designated INS tester or close the ticket.

stscijgbot-jp commented 1 year ago

Comment by Misty Cracraft on JIRA:

This seems to have problems as demonstrated in JP-2256. Probably can't close this until that one is resolved.

 

stscijgbot-jp commented 1 year ago

Comment by Misty Cracraft on JIRA:

Ticket JP-2256 has been closed, but I don't know if this ivm weighting method has been tested to the satisfaction of INS yet. Some of the people involved in both this ticket and the linked issue are Mattia Libralato Karl Gordon and Kevin Volk . Has anyone tested IVM weighting in resample and does it meet expectations?

stscijgbot-jp commented 1 year ago

Comment by Kevin Volk on JIRA:

I have run a number of NIRISS on-sky image sets through the pipeline to the end of level 3 and I see the IVM weighting being used, I believe.  I have not had the time to determine whether the uncertainties that result are good or bad.  Qualitatively it looks like the uncertainties from the pipeline are too large (as when I compare two observations in dithered observation and look at the change in signal for an object between the two cases).  But I have not looked into this quantitatively.