spacetelescope / drizzlepac

AstroDrizzle for HST images.

https://drizzlepac.readthedocs.io

BSD 3-Clause "New" or "Revised" License

52 stars 38 forks source link

Correct calculation of n1_exposure_time for HAP images #1852

Open stscijgbot-hstdp opened 4 months ago

stscijgbot-hstdp commented 4 months ago

Issue HLA-1295 was created on JIRA by Rick White:

There is an error in the logic that rejects catalogs for cosmic ray contamination. An example is image hst_11570_0b_acs_wfc_total_jb1f0b. The current version of the image from the HST public cache is in fact filled with CRs, as can be seen from the HLA interactive display. The visit has single exposures in the f435w and f814w filters plus 3 exposures in the f555w image. Despite the CRs, this old version of the catalog was accepted (there are enough additional non-CRs to make it considered possible to use).

But the new version of the image in the regression tests does not have CRs. This is a good thing! The CR-contaminated single filters were not included in the total image. I checked the header of the total image and confirmed that it contains only the f555w exposures. Here is the HLA interactive display for the new image, which looks excellent.

Unfortunately, the catalogs for this very good image were rejected by the CR contamination test. The problem is that the test uses a calculation that assumes that all the filters got used to generate the total image. It uses an equation that relies on the total exposure time in filters that have only a single exposure. For the new image, that number should be zero. Instead it is computed as 81.585 secs. Here are some (edited) lines from the hst_11570_0b_acs_wfc_total_jb1f0b_trl.txt trailer file:

Determining whether point and/or segment catalogs meet cosmic-ray threshold
  based on EXPTIME = 81.58500000000001sec for the n=1 filters
segment catalog with 481 good sources out of 482 total sources :  CR threshold = 1024.0172653846157
segment catalog FAILED CR threshold.
aperture catalog with 132 good sources out of 132 total sources :  CR threshold = 2048.0345307692314
aperture catalog FAILED CR threshold.

This means that the catalogs are getting incorrectly rejected for all of these images that now do not include CRs.

This should be pretty simple to fix. Code fix will be discussed in the comments.

stscijgbot-hstdp commented 4 months ago

Comment by Rick White on JIRA:

Here is the line in hapsequencer.py where n1_exposure_time gets computed. The loop that computes n1_exposure_time is:

for edp in total_product_obj.edp_list:
    tot_exposure_time += edp.exptime
    if n1_dict[edp.filters]['n'] == 1:
        n1_exposure_time += edp.exptime
        n1_factor += cr_residual

A simple approach can fix it. Add a test outside the loop to compute n1_exposure_time only if there are any entries in n1_dict that have n1_dict['n']>1:

if max([x['n'] for x in n1_dict.values()]) > 1:
    for edp in total_product_obj.edp_list:
        tot_exposure_time += edp.exptime
        if n1_dict[edp.filters]['n'] == 1:
            n1_exposure_time += edp.exptime
            n1_factor += cr_residual

That assumes that the total image is correctly computed by including only the N>1 images if any are available (which is the case for the current code).

stscijgbot-hstdp commented 4 months ago

Comment by Michele De La Pena on JIRA:

Rick White

I am afraid your suggestion of a fix does not work. I admit that I do not quite understand what the goal is at this time, so I am finding it hard to correct the algorithm. In this example there are 5 exposures, 1-f435w (360s), 1-f814w (375s), and 3-f555w (225s, 375s, 375s).

In this case the code which creates the n1_dict will create the SAME n1_dict over every filter_product_obj loop because the n1_dict is created from all of the exposures in the _total_project_obj.edp_list_ which is all 5 exposures and there is only one total_product_object because there is only one detector in play.

for edp in total_product_obj.edp_list:
    if edp.filters not in n1_dict:
        n1_dict[edp.filters] = {'n': 1, 'texptime': edp.exptime}
    else:
        n1_dict[edp.filters]['n'] += 1
        n1_dict[edp.filters]['texptime'] += edp.exptime

Would it be easier if we talked? If you have time as it seems you might be on vacation. In the mean time I will keep studying this code.

stscijgbot-hstdp commented 4 months ago

Comment by Michele De La Pena on JIRA:

Rick White

You say


Unfortunately, the catalogs for this very good image were rejected by the CR contamination test. The problem is that the test uses a calculation that assumes that all the filters got used to generate the total image. It uses an equation that relies on the total exposure time in filters that have only a single exposure. For the new image, that number should be zero. Instead it is computed as 81.585 secs.```

To clarify, since the total image in this dataset used only the three f555w exposures, the total exposure time should be zero?  This does not make sense to me.  Perhaps the variable names are confusing me.

stscijgbot-hstdp commented 4 months ago

Comment by Rick White on JIRA:

Michele De La Pena Here is a list of datasets from the regression test that have incorrectly rejected catalogs:

hst_11570_0b_acs_wfc_total_jb1f0b
hst_11570_84_acs_wfc_total_jb1f84
hst_11570_85_acs_wfc_total_jb1f85

stscijgbot-hstdp commented 4 months ago

Comment by Michele De La Pena on JIRA:

Rick White Robert Swaters I have moved the code which gathers the information for determining the n1_exposure_time, as well as the tot_exposure_time for the total detection image out of the "filter for loop" where the same computation was incorrectly being repeated. I also made the code consistent to use the dictionary created for accumulating the n1_exposure_time by, rather reverting to search each exposure object for exposure time again. This part of the algorithm is working fine. I tested using the three datasets @⁣rlw provided above. Note these datasets all contain 3-f555w, 1-f435ww, and 1-f814w.

Having said this, there is still an issue I need to check. I did an additional test by using one of the datasets above, BUT by using only a single exposure from each filter. Since there is only a single exposure for each filter, the algorithm will use all three exposures to create the total detection image (ref: HLA-1138). As expected the resultant drizzled image has cosmic rays present. The n1_exposure_time = tot_exposure_time = the sum of the exposure times for each of the constituent images. The catalogs are rejected for example

2024208204742 INFO src=drizzlepac.haputils.catalog_utils- segment catalog with 13840 good sources out of 13864 total sources :  CR threshold = 144000.0
2024208204742 INFO src=drizzlepac.haputils.catalog_utils- segment catalog FAILED CR threshold.
2024208204742 INFO src=drizzlepac.haputils.catalog_utils- aperture catalog with 13557 good sources out of 13564 total sources :  CR threshold = 288000.0

I will need to modify the rejection function in catalogutils.py. This is further reinforced by an HRC dataset which I processed. First, I only used 6 images from j96u01 to create a detection image. All is fine. Then I used only three single exposure images. The n1 and tot exposure times were correctly computed. The image only has one or two objects and is VERY clean. However, the catalogs were rejected as seen below. This algorithm needs to be updated. Again, the HRC images are fairly clean, short exposures, with a small FOV.

2024209213027 INFO src=drizzlepac.haputils.catalog_utils-   based on EXPTIME = 214.10410000000002sec for the n=1 filters    
2024209213027 INFO src=drizzlepac.haputils.catalog_utils- segment catalog with 3 good sources out of 3 total sources :  CR threshold = 32115.615
2024209213027 INFO src=drizzlepac.haputils.catalog_utils- segment catalog FAILED CR threshold.
2024209213027 INFO src=drizzlepac.haputils.catalog_utils- aperture catalog with 4 good sources out of 4 total sources :  CR threshold = 64231.23

Comments?

stscijgbot-hstdp commented 4 months ago

Comment by Michele De La Pena on JIRA:

I see the current problem as two issues which might be solved with one good fix.

HRC images are much smaller than UVIS, WFC, or WFPC2 images so the crfactor needs to be different (unless this entire equation is scrapped for a better criterion).

Because single filter exposures are now eliminated from the total detection image, the threshold for computing which catalogs should be eliminated no longer seems applicable in general: {}thresh = crfactor n1_exposure_time{}{}2 / texptime.{*}

The crfactor is set based upon catalog_type:

crfactor = {'aperture': 300, 'segment': 150} # CRs / hr / 4kx4k pixels

I do not know how these numbers were computed, but more importantly, they do not apply to HRC, as is. Also, the threshold is computed as

{}thresh = crfactor n1_exposure_time{}{}2 / texptime{*}

with the rejection as

if n_sources < thresh and 0 < n_sources: self.reject_cats[cat_type] = True

For detection images with have multiple exposures per filter, the {}thresh is zero{}. As long as these detection images have any found sources, they will NOT be rejected.

For detection images composed of ONLY single filter exposures, n1_exposure_time=texptime. This means the n_sources must effectively exceed the sum of "expected_CRs + real_sources)". Hmmm. I guess this can be true – just not in my test cases. This means these types of catalogs will probably mostly fail.

As such, the thresh (unless I am crazy) has basically been diluted. A new criterion is necessary.

stscijgbot-hstdp commented 4 months ago

Comment by Rick White on JIRA:

Michele De La Pena I was the one who created the criterion for rejection. So I should be able to explain it! The idea is that the number of CRs in the image can be roughly predicted from the exposure time. If all the filters have N=1 exposures, you just add up all the exposure times (that is n1_exposure_time). It is true that in that case the texptime is the same as n1_exposure_time, so the effective equation is

thresh = crfactor * n1_exposure_time

The test to reject is n_sources < thresh. The idea is that if there are more sources than predicted by thresh, that is a sign that most of the sources are not CRs. thresh is set to a conservatively high value, so if n_sources >= thresh then we can be pretty confident that the catalog is mainly non-CRs and so is good enough to keep. Mostly that happens for short exposures.

The crfactor should be adjusted for the physical area of the detector. Since the HRC image is much smaller, you are definitely correct that it ought to use a smaller value. According to the documentation, the camera info for ACS is:

ACS/WFC 4096**2 pixels, pixel size (15 um)**2
ACS/HRC 1024**2 pixels, pixel size (21 um)**2

So then the crfactor for ACS/HRC should be:

crfactor = 150 * ((1024*21)/(4096*15))**2 = 18.5 # for HRC segment
crfactor = 300 * ((1024*21)/(4096*15))**2 = 37   # for HRC point

stscijgbot-hstdp commented 4 months ago

Comment by Rick White on JIRA:

Michele De La Pena For WFPC2, the same approach can be used to get crfactor. Here are the numbers:

WFPC2 1600**2 pixels, pixel size (15 um)**2

So crfactor is:

crfactor = 150 * ((1600*15)/(4096*15))**2 = 23 # for WFPC2 segment
crfactor = 300 * ((1600*15)/(4096*15))**2 = 46 # for WFPC2 point