Open keflavich opened 2 years ago
Pixels that have one or more groups of the original ramp go into saturation will naturally have higher variance due to readnoise (VAR_RNOISE), due to the use of fewer groups in computing the slope in ramp fitting. Those larger VAR_RNOISE values in turn lead to larger ERR values (because ERR is the combination of all variance types). The weighting used in the resample step, when the IVM method is selected, is based solely on the inverse of VAR_RNOISE. Hence any pixels in the resample image that had some contribution from saturated pixels will have lower WHT values, and corresponding higher values in the resampled ERR and VAR_RNOISE arrays. If a pixel was completely saturated, i.e. no good groups from which to calculate a slope), it will be completely rejected during resampling and hence the CON array will have a lower value, because fewer input images contributed to the output pixel.
That's useful information, but it's still not quite enough to reliably get all saturated pixels.
I tried using that information with the following snippet (which I came to with a lot of iteration):
data = fits.getdata(fn, ext=('SCI', 1))
weight = fits.getdata(fn, ext=('WHT', 1))
err = fits.getdata(fn, ext=('ERR', 1))
datamax = np.nanmax(data)
# treat all negative values in the data as saturated
# (this only works if there are no negative regions caused by 1/f noise)
data[scipy.ndimage.binary_dilation(scipy.ndimage.binary_erosion(data<0))] = np.nanmax(data)
# then ID all regions with zero weight or zero error
labels, nlabels = scipy.ndimage.label((weight == 0) & (err == 0))
lsums = scipy.ndimage.sum_labels(np.ones_like(weight), labels, range(nlabels+1))
# exclude the all-zero edges
edge_indices = np.arange(nlabels+1)[lsums > 550][1:] # 1: excludes the 0'th, which is all weight!=0's
edgemask = np.isin(labels, edge_indices)
# deal with there being some unsampled regions near the edge that aren't on the edge
edgemask = scipy.ndimage.binary_dilation(edgemask, iterations=15)
labels[edgemask] = 0
# the rest are our saturated pixels
msk = labels != 0
msk = scipy.ndimage.binary_dilation(msk, iterations=2)
data[msk] = np.nanmax(data)
I had also tried masking based on WHT and CON, but found that they did not reliably point to saturated pixels; they more often identified good pixels with fewer contributions simply because of dead pixels.
Note in this example that VAR_RNOISE
is nan at the center and lower values around. That doesn't seem to match the idea that it should be higher.
Can you explain why you're particularly interested in tracking saturated pixels in the resampled/combined images, as opposed to say those pixels that had jumps detected? The ramp fitting process should compute a reliable slope for a pixel that had some saturation, and if it's completely saturated it'll be completely thrown out.
There are a few reasons I want to ID saturated stars: (1) I'd like to treat them differently in fitting. Saturated star fluxes may be recoverable by fitting the extended PSF, or by using the zero-frame data. The degree of saturation also has been useful in determining how much area to mask around them to exclude from fitting (fitting both extended background & for PSF photometry). (2) Aesthetics - I want to make multicolor images that look good. The saturated stars are the most prominent features, and saturated pixels, plus those surrounding them, badly mar the resulting image because different pixels tend to saturate in different filters (medium vs narrow filters).
Maybe I am interested in pixels that have jumps too? But if these are just pixels affected by cosmic rays in one exposure, say, I don't care about that - slightly increased noise isn't a problem.
It's difficult to track bit flags in the resampled product because the resampled pixels contain contributions from multiple input pixels, and there's no way to distribute a bit flag via the drizzle algorithm.
That said, you could make a boolean image (or zeros and ones) from the pixels were saturation prevented there being any good reads in the up-the-ramp fitting. These pixels will be flagged DO_NOT_USE and SATURATED. So create this boolean mask for each input image (_cal.fits) and then resample these to the exact same output grid as your _i2d.fits combined image. Then any output pixel with values >0 would have been effected by a DO_NOT_USE, SATURATED input pixel.
Still, as @hbushouse mentions above, this pixel may still have some good, non-saturated flux contribution from the other dither positions, especially if the core of the star is undersampled by the PSF and lands on the edge between 2 pixels as opposed to centered in a single pixel. But outlier_detection might be flagging the good ones in this case anyway, so you may want to turn that off while fiddling with this.
I was kind of expecting there to be a mask layer that records the number of good (i.e., not SATURATED
and not DO_NOT_USE
) pixels from cal
files contributing to a given pixel. That is similar to your suggestion, but maybe more useful.
I probably don't want to use any pixels (in the final i2d
image) that have contribution from, say, 1-2 cal
images out of 24 observations.
The number of pixels that contribute to a given pixel in the output image is always modulated by the associated weight map for each input image. The weight map can give you this information, even though it is an IVM weight map. I.e. where the IVM weighting is small, then there's not much contribution. Since all input pixel contribution to each output pixel is fractional (due to geometry and weighting), this might be the best approach.
If you care about number of _cal.fits images contributing to an output pixel, then the context image will give you this.
For HST ACS images, often one would look at an exposure time map of the output resampled image, but for multi-read IR data, the exposure time is per-pixel, and again, not a very accurate proxy for how one should be weighting any particular pixel. Hence the inverse-variance weighting that is used for JWST. We take advantage of the fact that we've correctly propagated errors in the whole pipeline up to this point (at least for shot noise, read noise and flatfield noise contributions).
The final l3 data products presently don't indicate which pixels have contributions from one or more saturated pixels. I'd like to have a clear way to identify these pixels. Is it possible that information could be included in an extension, or somehow integrated into the error maps or the CON extension?
See also JWST helpdesk ticket INC0181623.