spacetelescope / drizzlepac

AstroDrizzle for HST images.
https://drizzlepac.readthedocs.io
BSD 3-Clause "New" or "Revised" License
50 stars 38 forks source link

HAP: Test the segment catalog source position update which uses the Gaussian kernel #1782

Open stscijgbot-hstdp opened 3 months ago

stscijgbot-hstdp commented 3 months ago

Issue HLA-1244 was created on JIRA by Rick White:

The segment catalog source positions are systematically offset from the correct positions for many HAP-SVM catalogs. We identified this problem by comparing the segment and point catalog positions for the same image. Offsets as large as 4 pixels are found for ACS/WFC and WFC3/UVIS. These offsets are not rare: 31% of ACS/WFC and WFC3/UVIS images have segment positions that are shifted by more than 0.5 pixels, and 5% have shifts greater than 1 pixel.

These shifts are systematic: the great majority of sources in the image are shifted by the same amount in the same direction. The shifts are much larger than expected from random errors, and they are not random.

The shifts are also damaging. The point catalog apertures are correctly centered on stars, but the segment catalog apertures are offset from the stellar centers. As a result, the segment catalog magnitudes are systematically too faint. Shifts around 1 pixel result in MagAper2 magnitude errors of about 0.03 magnitudes (which is very noticeable), and the largest shifts of 3.5 to 4 pixels result in magnitude errors of greater than 1 mag (enormous errors).

Because the MagAper1 and MagAper2 magnitudes are used to compute the concentration index, the CI values are also very bad in these images. The CI values are typically much too large, which affect both the classification of sources as stellar and extended, and also prevents flagging of cosmic rays (which should have small CI values). As a result, cosmic rays are often not flagged in the segment catalogs (whereas they do get flagged correctly in the point catalogs).

This problem is a blocker for the Hubble Source Catalog development, and greatly degrades the usefulness of the HAP source lists. Unlike most other issues we have encountered, there is no way to recover from the incorrect photometry. We would have to discard nearly half of the HAP catalogs to create a database with systematic photometric errors smaller than 0.01 mag.

The comments will give some specific examples and will demonstrate the problem. I do not have a plausible cause to suggest for this bug. My only thought is that it could occur in the photutils package that generates the catalogs, simply because I do not see any obvious way that an error in the HAP catalog generation software could cause this issue.

stscijgbot-hstdp commented 3 months ago

Comment by Rick White on JIRA:

Here is a sample of 7 ACS images with large shifts between the point and segment catalogs. The shift column gives the median offset in pixels between the segment and point catalogs. The nmatches column is the number of matching sources between those catalogs. The display link loads the image in the HLA image display and overlays the HAP segment and point catalogs.

|| datasetname || shift || nmatches || display || | hst_10574_05_acs_wfc_total_j9dt05 | 2.58 | 121 | display | | hst_12209_01_acs_wfc_total_jbiv01 | 3.21 | 97 | display | | hst_15279_0w_acs_wfc_total_jdga0w | 3.09 | 183 | display | | hst_15936_22_acs_wfc_total_je4m22 | 2.18 | 264 | display | | hst_15944_02_acs_wfc_total_je4o02 | 2.34 | 54 | display | | hst_9401_26_acs_wfc_total_j8fs26 | 2.78 | 335 | display | | hst_9984_u9_acs_wfc_total_j8mbu9 | 3.73 | 365 | display |

Only one of these images has more than one filter in the visit: hst_9401_26_acs_wfc_total_j8fs26 has 2 filters, f475w and f850lp. The shifts are almost identical for the two filters in that case (as expected since both filter catalogs have positions determined in the total image).

There are 331 known ACS/WFC catalogs and 210 known WFC3/UVIS catalogs with shifts of > 2 pixels, similar to those in the table. Those counts are restricted to images with at least N>20 matches between the point and segment catalogs (those should be fairly secure matches). Large pixel shifts are rarer in WFC3/IR images (possibly because the pixel size is much larger), but there are 32 separation > 2 pixel cases known for WFC3/IR.

stscijgbot-hstdp commented 3 months ago

Comment by Rick White on JIRA:

The attached plots and images show information on the sample images.

!shifted_cats.png|width=80%! Scatter plots showing x, y offsets in pixels for each of the 7 images in the table. The expectation is that the positions should agree, putting them at 0,0 in the plots. There are offsets for each of the images, where the positions are systematically shifted by the same amount (up to 3.7 pixels). The circle has a radius of 5 pixels. The inset legend gives the number of matching sources, and the orange + marks the median offset. Note that while the shift is systematic for each source, it is not in the same x,y direction or offset. Different images have different shifts.

!hst_9984_u9_acs_wfc_total_j8mbu9.png|width=80%! The above image is a screenshot from the image display showing the center of the image and the segment (blue) and point (red) sources. The image is the last one in the table, which has the largest shift. The systematic offsets between the segment and point sources is apparent: the segment (blue) markers are shifted toward the lower right. And it is also obvious that the point (red) sources are correctly centered on the stars, while the segment sources are off center. The segment source positions are incorrect.

!shifted_cats-mags.png|width=80%! This plot shows the magnitude difference for MagAper2} in the {{segment and point catalogs. The segment MagAper2 minus point MagAper2 is plotted on the y-axis, and the x-axis is the median positional separation in pixels. Points above zero in dm (as they all are) are fainter in the segment catalog. If this were some kind of random effect, the errors would be scattered around zero with both positive and negative dm values. Instead, the segment magnitude is always fainter. That is more evidence that the segment positions are wrong, and the segment aperture is not correctly centered on the source. The line in this plot is a by-eye fit of a cubic polynomial to the magnitude offset versus separation.

!shifted_cats-mags2.png|width=80%! The systematic nature off the magnitude errors is clear for individual sources. This plot shows the dm magnitude difference versus positional offsets for all the matched sources in these fields. There is noise in the positional measurements. Sources that wind up with a segment position closer to the point position have a smaller magnitude difference. Note the large scale on the y-axis; the largest shifts (around 4.5 pixels) have magnitude errors of 3 mag, which is more than a factor of 10 fainter than the point measurement! These errors can be extremely large.

stscijgbot-hstdp commented 3 months ago

Comment by Rick White on JIRA:

The problems are further confirmed by a systematic comparison of all the HAP point and segment catalogs. Steve Lubow matched the point and segment catalogs for all the data in the HSC development database. There are more than 59,000 filter catalogs that have at least 20 matches between point and segment sources. The resulting distribution is shown below.

!shifted_cats-sephist2.png|width=80%!

The top row shows a histogram of the offsets between the point and segment images. The top left panel uses a linear y_ scale, and the top right panel is the same data but with a log _y scale (which shows the tail toward large shifts better). Separate curves are plotted for the 3 instruments used in the HSC: ACS/WFC (blue), WFC3/UVIS (green), and WFC3/IR (orange). There are differences between the instruments, which we assume are due to a combination of differences in pixel size (0.04 arcsec for WFC3/UVIS, 0.05 arcsec for ACS/WFC, 0.12 arcsec for WFC3/IR) and of the size of the PSF and the apertures used.

The WFC3/IR distribution is definitely more compact, with fewer catalogs having large separations. The ACS/WFC distribution is broader than the WFC3/UVIS distribution, but for large shifts (> 1 pixel) those two instruments look pretty similar. The bottom row of plots makes that comparison easier to see. It shows the cumulative fraction of images with shifts greater than the given separation, again with linear and log versions. By plotting the fraction rather than the counts, we take out differences due to the number of visits. All the curves are by definition unity at separation zero and go to zero at large separations.

The ACS/WFC and WFC3/UVIS (blue and green) curves are very similar in the log plot; some of the differences in the top plots are due to simply having more ACS catalogs than WFC3/UVIS catalogs. Again, the WFC3/IR shifts are smaller, but the problem still appears.

Steve Lubow has carried out a similar cross-match using the HLA catalogs that were utilized in the construction of HSCv3. He found that the typical offset between the catalogs is very small, < 0.05 pixels. That is 10 times better than the HAP point-segment offsets. Large offsets are very rare in the HLA source lists. That is the behavior that we expected (and assumed, until we discovered this issue).

stscijgbot-hstdp commented 3 months ago

Comment by Steve Goldman on JIRA:

I just noticed that, for the point source catalogs, the .reg and .ecsv files have a slight offsets.

stscijgbot-hstdp commented 2 months ago

Comment by Steve Goldman on JIRA:

Looking at that first example (j9dt05), it looks like the segmentation catalog "mask" values are being smeared in the same direction that we are seeing a discrepancy between the catalogs (see uploaded screenshots).

If we can figure out why the blobs are being smeared, we should be able to figure out why the segmentation catalog is offset.  

stscijgbot-hstdp commented 2 months ago

Comment by Rick White on JIRA:

Steve Goldman That may be helpful! Do the "mask" values show the pixels that are included in the segment map, indicating the boundaries of the sources? I was thinking it might be useful to look at those. The detection regions may be extended along the directions of the diffraction spikes (or perhaps along the directions where there are charge transfer effects).

If the source center is determined by the geometric center of the segment regions, that would skew it in the direction where the segment is extended. That would be fixed by using the flux-weighted center position rather than the geometric center of the source island. (I've looked at the code in haputils but have not figured out how the centroid gets computed, so that could be a silly suggestion.)

stscijgbot-hstdp commented 2 months ago

Comment by Michele De La Pena on JIRA:

The .reg and .ecsv catalogs are offset by "1" as the .reg diagnostic files X- and Y-coordinates are meant only for display with ds9 which is a one-based system.  However, the .ecsv X- and Y-coordinates are on a 0-based system as noted in the header portion of the *.ecsv file.

stscijgbot-hstdp commented 2 months ago

Comment by Michele De La Pena on JIRA:

Steve Goldman Rick White 

Just my two cents with regard to the offsets potentially being due to poor CTE correction being done in the CALXXX pipelines (i.e., residual flux causing an offset in the computation of the centroid in a particular direction)...

Unless there is something I have not considered, I would find it amazing that the coordinate error should be so consistent in length (offset amount) over the entire DRC image (both chips) as CTE losses should be worse where the pixels have been shifted more during readout.  I would also expect some variation in CTE trail length as a function of intensity of the original object.  Finally, the hst_9984_u9_acs_wfc_total_j8mbu9 screenshot shows a portion of the two ACS chips near the chip gap.  The parallel (Y) CTE correction is in opposite directions on the two chips, yet the offsets depicted here are in the same direction.

stscijgbot-hstdp commented 2 months ago

Comment by Rick White on JIRA:

I agree with Michele De La Pena regarding some kind of CTE effect. Not only does the consistency argue against that, but also the HLA catalogs (from the same data) don't show anything like this. And that test case (j8mbu9) is from an early ACS observation shortly after it went into orbit. The time-dependent CTE effects should be smallest for those early observations.

I think the funny extended regions in the mask that Steve Goldman found are very likely to be part of the explanation, but that just changes the questions to (1) why are the segments extended like that, and (2) why do the extended segments lead to large shifts? Still it seems like knowing it is related to the stretched segment regions will help track down the cause.

stscijgbot-hstdp commented 2 months ago

Comment by Rick White on JIRA:

Steve Goldman Michele De La Pena I just updated my innerspace page regarding the RickerWavelet2D kernel with more info that is relevant to this ticket. See the new section on "Definition and use of the convolved image" and the updated discussion in the "Can this fix the segment shift problem?" section.

For the Gaussian kernel, which is what was used for most (maybe all) of the worst images, the convolved image that is used to determine the source centroids is not background-subtracted! That is an error. The photutils documentation emphasizes that it must be background-subtracted to get accurate positions. Because it is not background-subtracted, the positions for faint sources (which means most sources in the image) will be shifted toward the geometrical center of the segment rather than located at the flux-weighted centroid as desired.

I believe that this is an important underlying issue that is causing the problem in this ticket. If that is correct, making the changes described on the innerspace page (which Michele is working on now) will fix this problem as well.

stscijgbot-hstdp commented 2 months ago

Comment by Rick White on JIRA:

Steve Goldman Michele De La Pena

One more comment: I made another update on the innerspace page that shows that only the Gaussian kernel catalogs have the position errors described in this ticket. That is exactly what is expected if the failure to subtract the background when using the Gaussian kernel is to blame. I now am very confident the changes described on that page will fix the problems in this ticket.

See the new figure in the "Can this fix the segment shift problem?" section.

stscijgbot-hstdp commented 1 month ago

Comment by Michele De La Pena on JIRA:

From Rick White

I just had a thought about ticket https://jira.stsci.edu/browse/HLA-1244 regarding the offset positions for segment catalogs.  In a comment to that ticket (with some attached images), Steve pointed out that the segmentation map has islands that are smeared out in one direction.  I think that issue, combined with the lack of background subtraction for the Gaussian kernel, leads to the shifts that we see.  So Michele's changes should make that much better.But I think we still do not understand why the segments are stretched out that way. Here's my thought: I wonder whether the code (inherited from Warren) that creates a kernel by finding a reference star could be the source of the problem?  If it picks a reference star that has diffraction spikes (or maybe bleeding), that could lead to a convolution kernel that is asymmetrical and is stretched out in one direction.  Then stars in the convolved image would be extended in one direction, which would lead to the segment map islands also being extended.

I'm worried about this effect.  I think it would be helpful for debugging if the kernel that is actually used got saved in a file so we can look at it.  It would also be helpful to save the segmentation map (which I think may already be possible during debugging).  I still have the feeling that trying to use a star image rather than a computed Gaussian kernel is dangerous and might lead to some problems and inconsistencies in the catalogs.

stscijgbot-hstdp commented 1 month ago

Comment by Michele De La Pena on JIRA:

Rick White 

Based on your statement of 23 May 2024, I processed the j8mbu9 dataset with the majority of the segmentation updates in the code.  The segmentation images still shows smearing, but your speculation regarding the convolution kernel that is asymmetrical and is stretched out in one direction was a fine speculation as can be seen by the image of the kernel.  Please see the DRC, the segmentation image, and the kernel image attached. The DRC and segmentation images depict the same image portion.

!Screen Shot 2024-05-23 at 2.16.45 PM.png!

!Screen Shot 2024-05-23 at 2.16.57 PM.png!

!Screen Shot 2024-05-23 at 1.47.00 PM.png!

stscijgbot-hstdp commented 1 month ago

Comment by Michele De La Pena on JIRA:

I short-circuited the decision to use a "custom" kernel based upon the actual data (SCI) which allowed the use of a Gaussian2DKernel for {}j8mbu9{}.  The image below shows a portion of the segmentation map using the Gaussian2DKernel on the left and the "custom" kernel on the right.  You can see the segments not elongated when using the Gaussian2DKernel.

!Screen Shot 2024-05-23 at 3.41.37 PM.png!

stscijgbot-hstdp commented 1 month ago

Comment by Rick White on JIRA:

Michele De La Pena I definitely do confirm your conclusions! I have attached a couple more images that show the results. The first is a surface plot of the kernel, which shows that the lumps that are extending the image are nearly as large as the central peak. !kernel.png|width=50%!

The second image is a section of the image with the edges of the segments overplotted. It blinks between the image and the image+segments: !blink.gif|width=100%! The impact of the poor kernel on the segments is clear. Some sources just get smeared off toward the lower right, but there are even cases where the source position is shifted entirely off the object, so that the source is not located on the star at all. An example of that is in the top of the image about halfway across.

Your fix of turning off the custom kernels sounds good to me. I'm even wondering whether we should turn them off when this function is called from the astrometry code. It seems to me that these problems will also happen when the astrometry catalogs are created, and they are likely to lead to bad astrometric solutions.

stscijgbot-hstdp commented 1 month ago

Comment by Michele De La Pena on JIRA:

This is how the Gaussian kernel is instantiated for the various detectors:

   kernel = build_gaussian_kernel(fwhm, npixels=source_box)

fwhm is either the default of 3.0 or derived from the TWEAK_FWHMPSF variable in the detector configuration files (e.g., instrument_detector_catalog_generation_all.json) and the platescale.    fwhm = TWEAK_FWHMPSF/detector_plate_scale

npixels is either the default of 7 or a passed value of "source box".  At this time the default is used for generating the kernel used in producing the output source catalogs.

TWEAK_FWHMPSF     ACS (hrc: 0.073, sbc: 0.065, wfc: 0.076)     WFC3 (ir: 0.14, uvis: 0.076)     WFPC2 (pc: 0.14)

plate scale     ACS (hrc: 0.027, sbc: 0.032, wfc: 0.049)     WFC3 (ir: 0.1283, uvis: 0.0396)     WFPC2 (pc: 0.0455)

stscijgbot-hstdp commented 1 month ago

Comment by Michele De La Pena on JIRA:

Steve Goldman 

As it appears this source of the problem is the custom PSF kernel which should be resolved by HLA-1257, this ticket should switch to testing all of the datasets originally listed in this ticket once the code is checked in.

stscijgbot-hstdp commented 1 month ago

Comment by Rick White on JIRA:

Thanks Michele De La Pena, that is exactly the info I needed to see.

I think the FWHM values in arcsec look OK except for WFPC2. The value 0.14 arcsec might be OK for the WFPC2/WF chips but is too big for the PC chips. I think (when we eventually make WFPC2 catalogs) we will get better results using a value of 0.1 arcsec. (But that is not an important change for now since we are not making WFPC2 catalogs yet.)

I created kernels by calling the Gaussian2DKernel and RickerWavelet2DKernel functions without the x_size and y_size parameters. That gives what astropy considers the default (minimum) sizes for those kernels. Here is a table:

           fwhm  pixel  fwhm  kernel           
inst     arcsec arcsec   pix RW2D G2D rw2d_size
acs_hrc   0.073 0.0270  2.70  15   11    15
acs_sbc   0.065 0.0320  2.03  11    7    23
acs_wfc   0.076 0.0490  1.55   9    7    15
wfc3_ir   0.140 0.1283  1.09   7    5    11
wfc3_uvis 0.076 0.0396  1.92  11    7    15
wfpc2_pc  0.140 0.0455  3.08  17   11    --
wfpc2_pc  0.100 0.0455  2.20  13    9    -- (modified FWHM)

The columns in the table are:

It is OK if the rw2d_size value is larger than the RW2D value that is required for the kernel. That makes the convolution take a little longer, but that really does not take much computing compared with the rest of the code. The rw2d_size in the config files is fine without change.

Note there is no rw2d_size for the WFPC2 catalogs. I include 2 lines for WFPC2, one using FWHM = 0.14 arcsec and one using FWHM = 0.1 arcsec. For the first case we would want rw2d_size to be large (>= 17), and for the smaller FWHM rw2d_size = 15, which is commonly used for the other instruments, would be fine.

The only significant issues I found is that there are a couple of cameras where a default size of 7 pixels for the Gaussian2DKernel is not big enough.

An easy way to make this change would be simply to change the value for the kernel size parameter to 11 for all instruments. That would save the effort of adding another configuration parameter. 11 for the Gaussian kernel would be big enough for all the cameras.

Otherwise I think this parameters are all correct.

stscijgbot-hstdp commented 1 day ago

Comment by Steve Goldman on JIRA:

I compared the catalogs for the 7 datasets that you mentioned using the most recent version of the Drizzlepac main branch.

 

I find that the point and segment catalogs do not show significant systemic offsets in the pixel values and very small systemic offsets (<0.03") in the RA/Dec positions (images attached).

 

Figures are point source catalog positions minus the segmentation catalog positions, in the image plane and RA/Dec. 

stscijgbot-hstdp commented 1 day ago

Comment by Michele De La Pena on JIRA:

Steve's analysis validates further the fix implemented in HLA-1257 did fix the systematic offsets originally detected in the named datasets.