Disable the code that changes HAP catalog flag values for saturated/faint sources

stscijgbot-hstdp commented 11 months ago

Issue HLA-1164 was created on JIRA by Rick White:

There are sources in the HAP catalogs that are marked as saturated (Flags = 4 or 5) even though they are below the 5-sigma detection threshold. Those sources are expected to have the Flags bit set for value 8 (source too faint). So their flags should be 12 or 13 instead of 4 or 5.

This apparent conflict in the flag values was created deliberately by the flag4and8_hunter_killer() function in the haputils/hla_flag_filters.py module. That function is called from the hla_saturation_flags() function after the saturation flags have been added, with the explicit goal of removing the "too faint" flag 8 for sources marked as saturated.

That approach is inherited from the HLA catalog flagging, where it was added in May 2015. For the HLA catalogs, the goal was to help with flagging spurious sources found near very bright objects using the "swarm" flag. Sometimes those extremely bright objects are missing from the catalogs, which makes it impossible for the swarm flagging to work.

A detailed investigation (see this innerspace page) found that the vast majority of the sources this algorithm unflags are in fact spurious junk objects that do not belong in the catalog. And in fact, the swarm flagging algorithm is not even implemented for the HAP catalogs. The swarm flagging code is commented out.

Changing the flags using the flag4and8_hunter_killer function is of at most limited benefit to the catalog even when the swarm filter is turned on. It has zero benefit without the swarm filter. And including those sources is reducing the HAP catalog quality.

The call to the flag4and8_hunter_killer function should be removed from the HAP code.

stscijgbot-hstdp commented 11 months ago

Comment by Steve Goldman on JIRA:

Thanks Rick. I've removed the line and in the process of testing the repercussions. Is there a dataset that the can think of that would work well for testing this?

Also, if the change is successful, do you see any reason to keep the code for the flag4and8_hunter_killer function. It will always be saved somewhere as a part of the github repository history.

stscijgbot-hstdp commented 11 months ago

Comment by Rick White on JIRA:

Steve Goldman The innerspace page that is linked in the description has a list of datasets that can be used for testing. Here are a few examples pulled from that page:

||imageName||nbad||display||Useful?||comment|| |hst_9442_04_acs_wfc_f625w_j6lp04|805|display|n|Has a few bright stars but extra sources are not on them| |hst_12254_06_acs_wfc_f625w_jbhf06|560|display|n|Has a few bright stars but extra sources are not on them| |hst_15232_06_wfc3_uvis_f814w_idla06|489|display|n|Has a few bright stars but extra sources are not on them| |hst_11711_01_wfc3_uvis_f600lp_ib2i01|256|display|n|Has a few bright stars but extra sources are not on them; lots of magerr < 0 sources| |hst_10146_02_acs_wfc_f606w_j90a02|721|display|y|Many bright stars; most extra sources are bad but some are on bright stars|

The Useful? column indicates whether the sources restored by flag4and8_hunter_killer could conceivably be helpful. I included one field where that column is y. But I think for all of these catalogs the results should improve.

Also note that the hst_11711_01_wfc3_uvis_f600lp_ib2i01 catalog has a bunch of the magerr < 0 sources (fixed by HLA-1161), so it should have a lot of changes with the current version of the code.

I think it would be fine to eliminate the hunter_killer function completely once things are confirmed to be working correctly. I don't see much value in keeping around a big block of unused code, and like you say it is still there in git.

stscijgbot-hstdp commented 11 months ago

Comment by Steve Goldman on JIRA:

Awesome. I'll take a look at the magerr < 0 change as well to make sure things are working as expected. Thanks Rick!

stscijgbot-hstdp commented 11 months ago

Comment by Steve Goldman on JIRA:

I actually wasn't able to test the magerr < 0 because that code still hasn't been merged and I'm on another branch.

But, the removal of the hunter killer functions seems to have had the desired effect. I get around 110 fewer sources with Flag < 8 and 110 more Flag = 12, 13, or 28 in the hst_10146_02_acs_wfc_f606w_j90a02 point source catalog. Each of those appear to be in the wings of diffraction spikes (image above).

stscijgbot-hstdp commented 11 months ago

Comment by Rick White on JIRA:

I agree, that looks like the expected effect from this change.

Note that image was the best-case version for keeping the "hunter-killer" approach -- there are a few actual saturated stars that are changed. That's why it got a y in the Useful? column. The other images are even worse, often having the modified sources in blank regions of the image. But I think this is definitely an improvement even for this image.

stscijgbot-hstdp commented 11 months ago

Comment by Steve Goldman on JIRA:

There is one regression test failing ([https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FDrizzlepac-Developers-Pull-Requests/detail/Drizzlepac-Developers-Pull-Requests/33/pipeline/337/])

Column MDRIZSKY data differs in row 1:

a> 7.152131745807893

b> 7.152102753066345

and very small differences in ~34% of pixels in the image. I assume the former is causing the latter.

stscijgbot-hstdp commented 11 months ago

Comment by Rick White on JIRA:

Hmm, it is hard to imagine how that could be the result of the catalog flag changes. I assume that MDRIZSKY gets determined early on in the processing before the catalogs are generated.

Is is possible that the same catalog code is being used to generate the catalogs used to aligned images and exposures to Gaia (or other reference catalogs)? If it were, that could lead to small changes in the alignment and changes in MDRIZSKY as a result. I think that kind of change would lead to wide-spread changes in the images and catalogs too.

But I would not have thought that the source flagging code would be used in creating the astrometric reference catalogs.

Maybe you can look at the log/trailer files to see whether the Gaia astrometric calibration matches have changed? If there have been changes, I would guess the new catalogs are better than the old ones because some bad sources are weeded out.

stscijgbot-hstdp commented 11 months ago

Comment by Steve Goldman on JIRA:

Thanks Rick. That gives me some well-needed direction.

stscijgbot-hstdp commented 11 months ago

Comment by Steve Goldman on JIRA:

The current truth file for that test has:

WCSNAME: 'undistorted not aligned'

versus the file that I produce:

WCSNAME: 'undistorted a posteriori solution align image-by-image to GAIAEDR3'

This seems like reason for the difference, so I'm going to go ahead and merge the new code. Thanks Rick White for all of your help on this, and also for the detail of your Jira tickets as always.

stscijgbot-hstdp commented 11 months ago

Comment by Steve Goldman on JIRA:

closed by #⁠1710

spacetelescope / drizzlepac

Disable the code that changes HAP catalog flag values for saturated/faint sources #1706