spacetelescope / drizzlepac

AstroDrizzle for HST images.
https://drizzlepac.readthedocs.io
BSD 3-Clause "New" or "Revised" License
51 stars 38 forks source link

Drizzlepac/HAP: Check WFPC2 detection thresholds for alignment #1663

Open stscijgbot-hstdp opened 1 year ago

stscijgbot-hstdp commented 1 year ago

Issue HLA-1112 was created on JIRA by Jennifer Mack:

Comment from Rick White in team_burgundy:

 

For image hst_5369_qr_wfpc2_wfpc2_u26kqr in the 2023-09-07 test sample, there are about 25 Gaia stars available for matching, but only about 10 of them (mostly saturated) appear in the final segment catalog. 

I wonder whether the detection threshold issues in the WFPC2 catalogs could be the culprit in getting good Gaia alignments.  If the threshold is too shallow (by a lot) then the Gaia stars do not get detected, and as a result the images do not get aligned. 

It would be worthwhile to have a look at this particular visit to see why it does not get aligned to Gaia;  I think the fraction of WFPC2 datasets that do get aligned will be very small if even this one does not succeed.

stscijgbot-hstdp commented 1 year ago

Comment by Rick White on JIRA:

The other image in the 2023-09-07 sample, hst_8230_04_wfpc2_wfpc2_f814w_u5c504, shows a similar effect. It has only 4 Gaia stars overlapping the image, but they all should be easily detected. The segment catalog has zero sources, and the image is not aligned with Gaia.

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Keeping notes here:

For U26KQR:

Median background: 0.0017260612221434712 Median RMS background: 0.4666666666666667 .

.

. Matching 2410 saturated pixels with 42 catalog sources

Matched source table and image attached. All of the matches are on the two saturated sources in in the center of the PC chip.

There only appears to be the single frame for the dataset and considerable cosmic ray contamination. It's unclear if there is a general purpose fix for this, but i'll take a look at the other dataset mentioned (u5c504). I'm guessing that the background is computed over the whole image, but perhaps there is a way to calculate it without the PC if there is a large discrepancy between the background for the PC and other chips. 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

The PC background is often different from the WF chips, possibly due to internal reflections for some targets.  

For SVM drizzle products with more than one input frame, the CR-rejection in the PC can also be bad (i.e. the parameters were not optimized for this chip and may give very noisy sky values.

Can we try to the alignment using only the WF chips?  That might give better results.   Attached is the HLA version of this image with the Gaia sources overplotted for reference.

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

This is the SVM image for u5c504 showing some pretty obvious streaks (attached). I will try to align this dataset using just the WF chips. 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

I think what you are calling streaks are nan values where there are bad pixel flags.  I didn't think there were enough of those to affect the centroids of stars...

Attached a slightly different binning of same image u5c504 in which the bad columns don't look so bad. I definitely agree PC noise is a lot higher than the WF.

 

 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

Attached shows the HLA preview of u5c504 with some large circles added by me to highlight the different catalogs...

It says 7 gaia sources, but I only see 4 in the image (blue) and a bunch outside the footprint. If one of these was sigma-clipped, I can see why it wouldn't get a Gaia fit...

On the other hand, there are tons of nice GSC242 sources, so it should have matched to those.

Note that the code found only 1 point source and 0 extended (segment) which are the source catalogs.   That suggests to me the larger noise in the PC could be preventing the code from finding objects in the WF, if a single threshold is used for all 4 chips.   We could either try ignoring PC for alignment or use different detection thresholds for the PC and WF.

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Aligning and doing the catalog creation on only the WF chips for u5c504 still results in only one source (detected twice) being identified in the point source catalog. The source and the resulting SVM image are attached. 

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

I'm guessing then that this is more of a thresholding issue.

Also please let me know if you would prefer me to upload things of refer to them differently. I guessing it gets confusing uploading a bunch of images and referring to them in the chat. 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

Thanks, Steve. 

I found these alignment parameters in the json file here: https://github.com/spacetelescope/drizzlepac/blob/master/drizzlepac/pars/hap_pars/svm_parameters/wfpc2/wfpc2/wfpc2_wfpc2_catalog_generation_all.json

    "generate_source_catalogs":       {         "box_size":13,         "win_size":3,         "nsigma":3.0,         "centering_mode": "starfind",         "bkg_estimator": "MedianBackground",         "rms_estimator": "StdBackgroundRMS",         "num_sources": 250,         "deblend": false,         "fwhmpsf": 0.13,

And I found these catalog parameters in the json file here :[https://github.com/spacetelescope/drizzlepac/blob/master/drizzlepac/pars/hap_pars/svm_parameters/wfpc2/wfpc2/wfpc2_wfpc2_catalog_generation_all.json]

    "dao": {         "bkgsig_sf": 4.0,         "kernel_sd_aspect_ratio": 0.8,         "TWEAK_FWHMPSF": 0.14,         "TWEAK_THRESHOLD": 3.0,         "bthresh": 5.0

It looks like both use threshold values of 3.0 (e.g. nsigma and TWEAK_THRESHOLD).  Typically when a normal threshold like 3.0 doesn't work, it suggests that the estimate of the background RMS is off.   In the case that it is overestimated, one can get around this by lowering the threshold but thats a kludge.   It would be better to look at the "StdBackgroundRMS" values to see if the are reasonable.  If not, we should fix that and not lower the thresholds.

 

Rick White may be able to comment on the correct threshold values to use to improve the point and segment source catalogs for this dataset... currently the HLA preview shows only one point source with the above parameters.

https://hla.stsci.edu/cgi-bin/display?image=/HAP/results_2023-09-07/u5c504/hst_8230_04_wfpc2_wfpc2_f814w_u5c504_drz.fits

 

!image-2023-09-14-18-51-02-926.png|width=622,height=64!

 

 

stscijgbot-hstdp commented 1 year ago

Comment by Rick White on JIRA:

I don't know what the appropriate change is to the threshold, but it seems to me Jenn is probably correct that there is something wrong with the RMS calculation. You could try just arbitrarily reducing the threshold, but it would be safer to get the rms right (in case whatever is causing it to be off is different for other images).

In case it is useful, here is a link to the HLA version of the image with the SourceExtractor version of the catalog overlaid. The HLA catalogs have 67 sources for the SourceExtractor (= segment) catalog and 59 sources for the DAOphot (= point) catalog. For comparison, the current HAP version has 0 segment sources and 1 point source.

I'm guessing that if we can get the thresholds set properly for the final catalogs, that will also identify the problem for the alignment catalog that is preventing Gaia from being matched.

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Okay, so moving forward I will try to compare the StdBackgroundRMS of u5c504 with other more normal programs. If the backgrounds are off, and I think we confirmed they are, I can try to see if the CR rejection for this program is different than other programs. Unless Jen, you have some other suggestions for moving forward. 

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Attached is the background data ('bkg_image_data.fits') being used, calculated by photutils.background.Background2D. It looks like while the average value in the image seems to be around 0.7, the average background mean is 0.99, likely due to some areas with a higher background.

 

Correction that is for one chip, the chips are all looking really different

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

It seems that Chip 1 is the issue, with a background (0.7) far lower than the the rest (~5)

stscijgbot-hstdp commented 1 year ago

Comment by Rick White on JIRA:

I'm looking at the u5c504 images. I confirm that the background is lower on the PC (chip 1), although it is not quite as big a difference as the numbers might make it seem.  For the 4 chips (looking at image u5c50401r) the backgrounds are 0.7, 5.0, 4.8, 5.0. However since the PC pixels have 1/4 the area of the WF pixels, we expect the PC values to be 4x smaller if the background per area is the same in the chips. If you multiply the chip 1 background by 4, it is about 2.9. So it is a factor of 0.6 lower than the background in the other 3 chips. That is significant but not a huge difference.

The rms in the background subtracted images for the 4 chips is not very different. I like to use the 1.48 times median of the absolute value as a robust estimator for the rms (it is much less affected by CRs and objects in the image than the rms). Here are the values for the 4 chips:

|| Chip || 1.48 * MAD || | 0 | 1.44 | | 1 | 1.42 | | 2 | 1.44 | | 3 | 1.47 |

So the noise in the images at the chip level is nearly identical.

Drizzling the images to the 0.05 arcsec pixel scale will change that though. In regions covered by the WF chips, the noise will be highly correlated between neighboring pixels because the input pixels are being expanded into 2x2 output pixels. In regions covered by the PC chip the noise is much less correlated since the pixel sizes match.

I don't know how the rms noise gets estimated for the source detection. If it just used the overall distribution of pixel values, I think the noise would be similar in the PC and WF regions. But if it uses the distribution of (say) the difference of adjacent pixels, the noise estimate will be much lower in the WF region than in the PC region. I imagine Jennifer Mack knows where the rms noise estimate comes from? Steve Goldman Do you have the rms estimates used for object detection in the drizzled images or in the various chips?

stscijgbot-hstdp commented 1 year ago

Comment by Rick White on JIRA:

From the photutils background documentation, it sounds like the StdBackgroundRMS algorithm uses a sigma-clipped RMS (if sigma-clipping is turned on). It might be a useful experiment to try using the MADStdBackgroundRMS background estimator, which will be less affected by noisy pixels at the edges of the chip in the shadowed area. That should give a good result with or without sigma-clipping.

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Thanks for doing all of this Rick, I will give the MADStdBackgroundRMS method a go and report back what I find. 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

Hi Steve, Adding a few comments from our conversation this afternoon:

1.) I'm concerned that the sky images (and stats) may be not using the DQ arrays to mask the edges of the chips (e.g. Bad pixels flagged as DQ=2 in the FLT DQ arrays). See attached figure u5c504_sky for each chip (in their proper relative position but not at their proper orientation on the sky. Compare to the drizzled image at right, where these regions are removed when combining chips.

2.) For initial testing, let's just use the WFPC2 drizzled frames (not the FLT frames) for alignment and see if you can get this to work.  The easiest test would be to lower the threshold from 3.0 to 2.0 and then 1.0 to see if you can get it to align.  Note you can try this for both the alignment catalog and source catalog thresholds.  (This is just a quick fix and could suggests an error with computing the sky rms.   For example, for WFC3/IR, which is highly undersampled like the WF chips, I have to use a crazy low threshold value of 1.0 in tweakreg for it to find sources, whereas for WFC3/UVIS I have to use a crazy high threshold, like 2000.0.)  So the rms calculation may not be a simple fix as one might expect.  Lowering the threshold this much, on the other hand, means we will need to check a larger data sample to be sure it doesn't adversely affect the alignment for an image with lots of stars.

3.) The dataset u5c504 with 2 input frames has only 4 gaia sources in the frame.  (3 are outside the active area). If even one of these is sigma-clipped, you will not get a Gaia fit, but you should get a GSC242 fit with 64 objects.  Note that it might be easier to start testing with hst_5369_qr_wfpc2_wfpc2_f814w_u26kqr_drz.fits which has at least 2 dozen Gaia sources. While it is only a single input frame and will have CR's, I think the algorithm is designed to reject these from the alignment.

4.) It may be that the noise in the PC region of the DRZ image is much higher than in the WF regions. I did notice that the cosmic ray algorithm tends to flag deviant pixels in the sky of the PC.  This is where manually checking the RMS (MADStdBackgroundRMS) in different regions of the DRZ image will be useful.

5.) If you can get the DRZ frames to align to Gaia or GSC242, you can then move on to testing the alignment in the FLT frames, which appears more complicated due to the bad regions between chips.

Feel free to reach out anytime via slack. Jen

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

P.S.  I did a quick check of the background RMS by hand for hst_5369_qr_wfpc2_wfpc2_f814w_u26kqr_drz_sci.fits (which has an identical SCI array as the standard product u26kqr01t_drw.fits)

See my ugly PyRAF results in the attached u26kqr_RMS_check.png which has stats and image histogram in different regions.

It appears that the mean sky value in the PC is about 0.002 e/s which is about the same as in the WF chips. This is by design... a uniform surface brightness source such as the sky will be the same regardless of the plate scale.

The sky RMS on the other hand is a factor of 4 higher in the PC (0.002 e/s) versus WF2 or WF3 (6e-4 e/s).  This factor of 4 is approx equal to the ratio of the plate scale squared, e.g. (0.0996/0.04554)**2=4.78

So this could be important in the source detection for DRZ frames.  (I didnt check the *flw.fits stats)

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Working on the program with only one frame (u26kqr) I was able to try MADStdBackgroundRMS, but there was no change in the calculated background for any of the chips. I then tried different sigma levels for the gaia dr3 align and that did change the number of detected sources from the default 3sigma (22 sources) to 1sigma(250 sources).

 

When these sources are crossed matched with the gaia catalog (62 sources) it still finds 0 matches. I'll continue to look for reasons why it isn't finding any matches.

I'm guessing it's a bug in the code but is it possible that the a priori solution is off? 

stscijgbot-hstdp commented 1 year ago

Comment by Rick White on JIRA:

Hmm, good question about the a priori solution being off. Do you actually see an a priori solution for the u26kqr image? When I look at the HLA version of the image, I believe there is not any HLA astrometric shift from the HSC. However, in the (very) old HLA data processing, the images were matched to whatever catalogs were available in 2009 and there were corrections applied to the image. That shift happened in the image processing and was (I think) not included in any HLA shift that is in the astrometry database.

This makes me wonder what would happen if there were an HLA shift, however? The HLA data is drizzled to a pixel scale of 0.1 arcsec, unlike the HAP data. So it seems like it could be possible that if there is an attempt to apply a shift from the HLA or HSC, there could be an error that comes from the change in plate scale?

My guess is there is nothing crazy about the HAP WCS though, because if you look at the HAP image with Gaia sources things look generally OK. There is a shift of about 0.5 arcsec in RA and 0.8 arcsec in Dec, but that is going to be a pretty average shift for this data.

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

I looked at the reference catalog positions and image catalog positions (attached) during matching for one of the chips. It looks like there are some matchable sources regardless of whether a lower background threshold is used. I see maybe 8 sources that should be matched, but are not. I don't know whether they are too far away from each other, or there is some other check that is rejecting the fit, but it looks like this should be working. 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

Did you try aligning the DRZ frame yet?   The FLT/FLW frame will be likely be more difficult.

It's unlikely an issue with the a priori solution. But to test this you can increase the searchrad parameter in from the default (1.0") to something larger (3.0").

 

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Sorry I may be confused. The alignment and catalogs that I showed were created during the drizzlling of the C0M files. Is that what you mean by aligning the DRZ?

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

There is a portion of the code that uses the drizzled frame and try to get alignment.  (I'm not sure if it's in the alignment code for standard products or if its in the SVM code, but I did find this: https://drizzlepac.readthedocs.io/en/latest/reference/runsinglehap.html

(Note that the source catalogs for photometry also use the drizzled images as input.)

I would start with u26kqr01t_drw.fits for _drz.fits file which is produced from drizzling together the 4 science extensions of  u26kqr01t_flw.fits (_flt.fits), which is originally computed from on the u26kqr01t_c0m.fits/_c1m.fits files. Doing it chip by chip is harder, especially since the PC has a smaller area on the sky.

To get good alignment, its typically easier use the drizzled frame (which is not only larger, but also has cosmic-rays removed).  The software propagates any WCS solution from the drizzled frame back to the *flw.fits files (I'm not sure where it is on the code but Michele could help with this part.)

Ultimately we don't care as much if the standard products don't align, but we really want the advanced products (e.g. hst_5369_qr_wfpc2_wfpc2_f814w_u26kqr_drz.fits) to get a posteriori solutions. 

(I can't recall if we deliver the corresponding hst_5369_qr_wfpc2_wfpc2_f814w_u26kqr_flt.fits files, but if so, the WCS from the DRZ frame would be propagated back to that file as well.)

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

So I put the failing alignment of the DRZ file aside and looked at the SVM/catalog creation for "u26kqr". It looks like the aligning fails with every catalog including 2mass. It then tries to identify sources and it does this correctly for the extended source catalog, but then only identifies a couple sources within the PC for the point source catalog. 

 

I spoke with warren and michele about this and we thought that it might be from the weighting during the fit process. We recently fixed the weighting to work correctly  (HLA-1059and thought that this might be related. The weighting, before it was fixed, was: np.sqrt(ref_catalog['RA_error') 2 + ref_catalog['DEC_error'] 2) and now it is 1 divide by the previous calculation. 

Since before it weighed noisier (and likely fainter) sources higher, perhaps now it is trying to weigh the fits of the brightest gaia sources with saturated HST sources. To be honest, I'm still trying to figure out if the weighting is being used, but, this might be some food for thought. 

stscijgbot-hstdp commented 1 year ago

Comment by Rick White on JIRA:

I can see how that could be contributing to the problem. I guess if the algorithm that decides whether a fit is acceptable uses the rms between the reference positions and the fitted HAP positions, the fit might get rejected if it overweights the saturated stars and winds up with a poorer fit for the fainter stars.

A couple of thoughts about this. Making the weights proportional to the Gaia errors (instead of 1/error) is a pretty risky way of trying to get lower weights for saturated stars. I'm guessing that was not the intent of the original version of the weights, but if it was then eek! The right approach would be to include in the weights an error estimate for the position of the HAP source too. So then the weight would be 1/sqrt(gaia_error\*\*2 + HAP_error\*\*2) A guess at the HAP error is about 0.1 times the pixel size for most sources (so 0.005 arcsec = 5 mas for ACS/WFC and WFC3/UVIS). For saturated sources the error will be larger by a factor of a few.

Simply including a value for HAP_error would help quite a bit at avoiding overweighting the very brightest Gaia stars. Effectively it puts a lower bound on the error, so all Gaia stars with small errors will get about the same weight (= 1/HAP_error). That will keep the weighting from trying to fit only the super-bright stars.

I'm a bit skeptical this is the actual problem with getting a fit in this case, but it does seem like a good idea to modify the weights to avoid exaggerated values for the brightest Gaia stars.

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Alright, I can add a ticket for improving the weighting logic.

I did some more digging and is seems that for the identification of sources for SVM catalog creation, the code isn't identifying any sources outside of the PC. This is before the code flags them for various reasons (saturated, extended, hot pixels, etc.) After the flagging, only one remains. 

I'm still getting use to the code, but I'll try to look more at why it can't find any sources outside of the PC area. I'm guessing it might still be due to the background?

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

My suspicion is that since the noise in the PC is 4x higher than in the WF chips, a 3-sigma detection threshold won't find sources in the WF.  Can you try lowering the threshold again (this time testing with the drz.fits and not the c0m.fits files) and seeing if this helps identify more sources?

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Sorry Jenn. I'm still getting confused about the step in the process you're referring to. 

When you say "testing with the drz.fits" do you mean doing an alignment using u26kqr01t_drw.fits as an input? 

I'm still getting use to the steps. The code seems to do source identification several times.

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

I'd start with the SVM code which (I believe) uses the u*drw.fits as input for alignment (if not in the first pass, then in a later pass).

Since the PC is noisier, it might be throwing off the RMS calculation for the full detector u*drw.fits image.  One way to get around this is to just lower the threshold and see if the SVM code is able to get alignment.  If so, we can decide how to proceed from there.

If that doesn't work, then it could be an error weighting issue described by Rick: "very bright saturated stars to be weighted too much, which is causing trouble for the fit."

 

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Okay, understood. Thanks for clarifying that. I'll give that a try. 

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

So I had a chat with Michele and Warren and I’ve learned some things.

 

 

 

 

 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

Thanks Steve.   This is helpful information. Two questions:

When it uses the FLT for alignment, does it use the DQ arrays to mask bad pixels (i.e. the edges of each chip) before alignment?

When doing the alignment, does it make source catalogs for all 4 chips and then do the alignment? Or does it try to align the PC chip first? 

 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

P.S. u5c504 should be getting a GSC242 FIT with > 60 sources in the fov (but only one in the PC chip)  in the attachment below you can see the tiny red circles are systematically offset from the real sources. 

!image-2023-10-05-14-43-49-094.png!

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

From my understanding it does the alignment on the sources it finds in all of chips at the same time. I know that because I asked the same question to warren yesterday. I'm not sure how it stores the flags for bad pixels and CRs, but I know that it is a step before the alignment. I can check if its stored in the DQ array. 

The problem step that gets me stuck is during the call to daophot. Our internal docs also refers to it as a black box. I know that there should be matches, and I can see that with the input catalogs (attached). 

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

The offset is interesting. This is something Warren mentioned about the difference between our results:

Specifically, the CRVAL1 and CRVAL2  keyword differences are very large.  In fact, the RA/Dec of the center pixel of the PC in my FLT file would land at (402.529, 378.116) in your FLT file, not (400,400).  This would definitely keep the fitting routine from working, since all the sources from the PC would be offset by this difference (over 22 pixels) relative to the sources from the other chips.  Now it is just a matter of working out what has crept into your environment (or the pipeline?) to cause this offset.

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

A 25 pixel offset is >1" offset, which I believe is the default value for the tweakreg (tweakwcs) parameter searchrad.

https://drizzlepac.readthedocs.io/en/deployment/tweakreg.html

Can you try increasing this to 2" and see if it has any effect?

stscijgbot-hstdp commented 1 year ago

Comment by Steve Goldman on JIRA:

Setting the searchrad=500 (the default in 250 in the tangent plane's units, i'm assuming pixels?) for matching_2dhist_fit() in align_utils.py still fails at getting more than 1 match. 

Also, the function (match_2d_hist_fit) description is the following:

If the PC chip only has one identified source, maybe it is messing up when it is applying the fit to the PC?

I'm still working on testing versions of the dependencies to see if that is the issue. 

stscijgbot-hstdp commented 1 year ago

Comment by Jennifer Mack on JIRA:

Wow, that's a huge search radius if that's in pixels. It must not be because 250 pix *0.05"/pix is 12" which would be insane. 

Could it be in milliarcsec (e.g. 0.250") ?  That would actually be too small in most cases.  For WFPC2 era data, pointing errors should be between 1-2".

Thanks for looking in to this.... I will stay tuned.  

Also, please let me know what you find about using the DQ arrays for masking before source detection.  I'm glad to hear it fits sources from all chips as a single group. :)