Open stscijgbot-hstdp opened 4 months ago
Comment by Rick White on JIRA:
Here is the line in hapsequencer.py
where n1_exposure_time
gets computed. The loop that computes n1_exposure_time
is:
for edp in total_product_obj.edp_list:
tot_exposure_time += edp.exptime
if n1_dict[edp.filters]['n'] == 1:
n1_exposure_time += edp.exptime
n1_factor += cr_residual
A simple approach can fix it. Add a test outside the loop to compute n1_exposure_time
only if there are any entries in n1_dict
that have n1_dict['n']>1
:
if max([x['n'] for x in n1_dict.values()]) > 1:
for edp in total_product_obj.edp_list:
tot_exposure_time += edp.exptime
if n1_dict[edp.filters]['n'] == 1:
n1_exposure_time += edp.exptime
n1_factor += cr_residual
That assumes that the total image is correctly computed by including only the N>1
images if any are available (which is the case for the current code).
Comment by Michele De La Pena on JIRA:
I am afraid your suggestion of a fix does not work. I admit that I do not quite understand what the goal is at this time, so I am finding it hard to correct the algorithm. In this example there are 5 exposures, 1-f435w (360s), 1-f814w (375s), and 3-f555w (225s, 375s, 375s).
In this case the code which creates the n1_dict will create the SAME n1_dict over every filter_product_obj loop because the n1_dict is created from all of the exposures in the _total_project_obj.edp_list_ which is all 5 exposures and there is only one total_product_object because there is only one detector in play.
for edp in total_product_obj.edp_list:
if edp.filters not in n1_dict:
n1_dict[edp.filters] = {'n': 1, 'texptime': edp.exptime}
else:
n1_dict[edp.filters]['n'] += 1
n1_dict[edp.filters]['texptime'] += edp.exptime
Would it be easier if we talked? If you have time as it seems you might be on vacation. In the mean time I will keep studying this code.
Comment by Michele De La Pena on JIRA:
You say
Unfortunately, the catalogs for this very good image were rejected by the CR contamination test. The problem is that the test uses a calculation that assumes that all the filters got used to generate the total image. It uses an equation that relies on the total exposure time in filters that have only a single exposure. For the new image, that number should be zero. Instead it is computed as 81.585 secs.```
To clarify, since the total image in this dataset used only the three f555w exposures, the total exposure time should be zero? This does not make sense to me. Perhaps the variable names are confusing me.
Comment by Rick White on JIRA:
Michele De La Pena Here is a list of datasets from the regression test that have incorrectly rejected catalogs:
hst_11570_0b_acs_wfc_total_jb1f0b
hst_11570_84_acs_wfc_total_jb1f84
hst_11570_85_acs_wfc_total_jb1f85
Comment by Michele De La Pena on JIRA:
Rick White Robert Swaters I have moved the code which gathers the information for determining the n1_exposure_time, as well as the tot_exposure_time for the total detection image out of the "filter for loop" where the same computation was incorrectly being repeated. I also made the code consistent to use the dictionary created for accumulating the n1_exposure_time by, rather reverting to search each exposure object for exposure time again. This part of the algorithm is working fine. I tested using the three datasets @rlw provided above. Note these datasets all contain 3-f555w, 1-f435ww, and 1-f814w.
Having said this, there is still an issue I need to check. I did an additional test by using one of the datasets above, BUT by using only a single exposure from each filter. Since there is only a single exposure for each filter, the algorithm will use all three exposures to create the total detection image (ref: HLA-1138). As expected the resultant drizzled image has cosmic rays present. The n1_exposure_time = tot_exposure_time = the sum of the exposure times for each of the constituent images. The catalogs are rejected for example
2024208204742 INFO src=drizzlepac.haputils.catalog_utils- segment catalog with 13840 good sources out of 13864 total sources : CR threshold = 144000.0
2024208204742 INFO src=drizzlepac.haputils.catalog_utils- segment catalog FAILED CR threshold.
2024208204742 INFO src=drizzlepac.haputils.catalog_utils- aperture catalog with 13557 good sources out of 13564 total sources : CR threshold = 288000.0
I will need to modify the rejection function in catalogutils.py. This is further reinforced by an HRC dataset which I processed. First, I only used 6 images from j96u01 to create a detection image. All is fine. Then I used only three single exposure images. The n1 and tot exposure times were correctly computed. The image only has one or two objects and is VERY clean. However, the catalogs were rejected as seen below. This algorithm needs to be updated. Again, the HRC images are fairly clean, short exposures, with a small FOV.
2024209213027 INFO src=drizzlepac.haputils.catalog_utils- based on EXPTIME = 214.10410000000002sec for the n=1 filters
2024209213027 INFO src=drizzlepac.haputils.catalog_utils- segment catalog with 3 good sources out of 3 total sources : CR threshold = 32115.615
2024209213027 INFO src=drizzlepac.haputils.catalog_utils- segment catalog FAILED CR threshold.
2024209213027 INFO src=drizzlepac.haputils.catalog_utils- aperture catalog with 4 good sources out of 4 total sources : CR threshold = 64231.23
Comments?
Comment by Michele De La Pena on JIRA:
I see the current problem as two issues which might be solved with one good fix.
The crfactor is set based upon catalog_type:
crfactor = {'aperture': 300, 'segment': 150} # CRs / hr / 4kx4k pixels
I do not know how these numbers were computed, but more importantly, they do not apply to HRC, as is. Also, the threshold is computed as
{}thresh = crfactor n1_exposure_time{}{}2 / texptime{*}
with the rejection as
if n_sources < thresh and 0 < n_sources: self.reject_cats[cat_type] = True
For detection images with have multiple exposures per filter, the {}thresh is zero{}. As long as these detection images have any found sources, they will NOT be rejected.
For detection images composed of ONLY single filter exposures, n1_exposure_time=texptime. This means the n_sources must effectively exceed the sum of "expected_CRs + real_sources)". Hmmm. I guess this can be true – just not in my test cases. This means these types of catalogs will probably mostly fail.
As such, the thresh (unless I am crazy) has basically been diluted. A new criterion is necessary.
Comment by Rick White on JIRA:
Michele De La Pena I was the one who created the criterion for rejection. So I should be able to explain it! The idea is that the number of CRs in the image can be roughly predicted from the exposure time. If all the filters have N=1
exposures, you just add up all the exposure times (that is n1_exposure_time
). It is true that in that case the texptime
is the same as n1_exposure_time
, so the effective equation is
thresh = crfactor * n1_exposure_time
The test to reject is n_sources < thresh
. The idea is that if there are more sources than predicted by thresh
, that is a sign that most of the sources are not CRs. thresh
is set to a conservatively high value, so if n_sources >= thresh
then we can be pretty confident that the catalog is mainly non-CRs and so is good enough to keep. Mostly that happens for short exposures.
The crfactor
should be adjusted for the physical area of the detector. Since the HRC image is much smaller, you are definitely correct that it ought to use a smaller value. According to the documentation, the camera info for ACS is:
ACS/WFC 4096**2 pixels, pixel size (15 um)**2
ACS/HRC 1024**2 pixels, pixel size (21 um)**2
So then the crfactor
for ACS/HRC should be:
crfactor = 150 * ((1024*21)/(4096*15))**2 = 18.5 # for HRC segment
crfactor = 300 * ((1024*21)/(4096*15))**2 = 37 # for HRC point
Comment by Rick White on JIRA:
Michele De La Pena For WFPC2, the same approach can be used to get crfactor
. Here are the numbers:
WFPC2 1600**2 pixels, pixel size (15 um)**2
So crfactor
is:
crfactor = 150 * ((1600*15)/(4096*15))**2 = 23 # for WFPC2 segment
crfactor = 300 * ((1600*15)/(4096*15))**2 = 46 # for WFPC2 point
Issue HLA-1295 was created on JIRA by Rick White:
There is an error in the logic that rejects catalogs for cosmic ray contamination. An example is image
hst_11570_0b_acs_wfc_total_jb1f0b
. The current version of the image from the HST public cache is in fact filled with CRs, as can be seen from the HLA interactive display. The visit has single exposures in thef435w
andf814w
filters plus 3 exposures in thef555w
image. Despite the CRs, this old version of the catalog was accepted (there are enough additional non-CRs to make it considered possible to use).But the new version of the image in the regression tests does not have CRs. This is a good thing! The CR-contaminated single filters were not included in the
total
image. I checked the header of thetotal
image and confirmed that it contains only thef555w
exposures. Here is the HLA interactive display for the new image, which looks excellent.Unfortunately, the catalogs for this very good image were rejected by the CR contamination test. The problem is that the test uses a calculation that assumes that all the filters got used to generate the
total
image. It uses an equation that relies on the total exposure time in filters that have only a single exposure. For the new image, that number should be zero. Instead it is computed as 81.585 secs. Here are some (edited) lines from thehst_11570_0b_acs_wfc_total_jb1f0b_trl.txt
trailer file:This means that the catalogs are getting incorrectly rejected for all of these images that now do not include CRs.
This should be pretty simple to fix. Code fix will be discussed in the comments.