adjust tolerances for coron registration

braingram commented 3 months ago

This PR attempts to reduce the differences for coron3 results when run on different systems.

The changes are:

remove casts from float32 to float64 and back from float64 to float32 in image registration
tighten tolerances for leastsq fit in image registration
relax tolerances for regtest result comparisons

I ran a (limited) jenkins run: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/1666/pipeline and (limited) github actions run: https://github.com/spacetelescope/RegressionTests/actions/runs/10511902948/attempts/1

comparisons of the results using the above tolerances pass. However, the regtests will fail comparison against the current truth files. I suggest that we:

merge this PR (when approved)
okify the regtest results (the jenkins runs should now pass)
run the github actions tests

I believe these should pass although it is possible that follow-up changes will be needed as it's difficult to fully test this PR without generating new truth files. If it's preferable I could generate new truth files, put them in a different location on artifactory and run the jenkins and github actions jobs using those new truth files to judge the changes in this PR before the merge.

Full regtests runs for:

Ignoring all the miri_lrs mtimage failures and the miri_image failure (which all look unrelated) the coron3 tests that fail on jenkins also fail on github actions:

test_nircam_coron3_sci_exp[002-psfalign] 18 different pixels found (0.00% different).
test_nircam_coron3_sci_exp[003-psfalign] 22 different pixels found (0.00% different).
test_nircam_coron3_product[i2d] 30 different pixels found (0.02% different).
test_miri_coron3_sci_exp[4-psfalign] 74000 different pixels found (0.76% different).
test_miri_coron3_sci_exp[4-psfsub] 1813 different pixels found (0.56% different).
test_miri_coron3_sci_exp[5-psfalign] 151570 different pixels found (1.57% different).
test_miri_coron3_sci_exp[5-psfsub] 2384 different pixels found (0.74% different).
test_miri_coron3_product[i2d]40259 different pixels found (46.80% different).

Pulling down these files and comparing them with tolerances matching the ones in this PR show no differences between jenkins and github actions.

Most differences with the truth files are small (<2%) except for coron3_product[i2d] which shows 46.80% different. Looking at the miri i2d file compared to the truth the differences are mostly small

with the largest differences concentrated in the corner of the image (see the few pixels near the upper left):

which corresponds with extreme values present in both the truth:

and jenkins results (with this PR).

Looking at only the central 50-200 x 50-200 pixels the differences are much smaller

but do show some structure.

Checklist for PR authors (skip items if you don't have permissions or they are not applicable)

[x] added entry in CHANGES.rst within the relevant release section
[ ] updated or added relevant tests
[ ] updated relevant documentation
[ ] added relevant milestone
[ ] added relevant label(s)
[ ] ran regression tests, post a link to the Jenkins job below. How to run regression tests on a PR
[ ] All comments are resolved
[ ] Make sure the JIRA ticket is resolved properly

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 61.76%. Comparing base (9f82b52) to head (26f40c7). Report is 13 commits behind head on master.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #8717 +/- ## ========================================== + Coverage 60.75% 61.76% +1.01% ========================================== Files 375 377 +2 Lines 38367 38743 +376 ========================================== + Hits 23308 23929 +621 + Misses 15059 14814 -245 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

stscirij commented 2 months ago

I wonder if it might be preferable to select an image region for the comparison that excludes the very noisy (and scientifically unimportant) edges, that way we can keep the tolerances lower

braingram commented 2 months ago

I wonder if it might be preferable to select an image region for the comparison that excludes the very noisy (and scientifically unimportant) edges, that way we can keep the tolerances lower

I think there are multiple issues here: 1) the leastsq parameters are imprecise (hopefully addressed by this PR) 2) data casting introduces machine differences (hopefully addressed by this PR) 3) reference files don't appear to match expectations of the algorithm (which this PR doesn't address and I think is most relevant to your comment)

The psfmask documentation describes the SCI extension of the reference file:

The values in the SCI array give the mask values to be applied to the images when computing relative shifts. The mask acts as a weighting function when performing Fourier fits. The values range from zero (full weighting) to one (pixel completely masked out).

However, looking at one of the nircam reference files: https://jwst-crds.stsci.edu/browse/jwst_nircam_psfmask_0212.fits the mask looks like:

and the algorithm is giving "full weighting" to what you noted are scientifically unimportant parts of the image (the corners). I agree that there are issues here that likely require attention. Although I'm not quite sure how deep the rabbit hole is as I'm also not sure that the documentation agrees with the algorithm and a 0 value in the mask looks to me like it would completely remove the contribution of the corresponding value from the fit (by multiplying the difference by 0). https://github.com/spacetelescope/jwst/blob/be36c547939b376f9028ff449049d9507039edf9/jwst/coron/imageregistration.py#L78

For the above reasons I attempted to avoid algorithmic changes in this PR.

stscirij commented 2 months ago

Yes, my mistake. I was thinking of the tolerances for the comparisons

braingram commented 2 months ago

Yes, my mistake. I was thinking of the tolerances for the comparisons

Would it be helpful to reword the changelog? Maybe something like:

Tighten tolerances of leastsq fit for psf alignment and loosen comparison tolerances in unit tests.

stscirij commented 2 months ago

Probably not worth the effort, unless you feel the original entry was inadequate

braingram commented 2 months ago

This will need an okified regtest run. I see @tapastro is running one now and I'll try to queue one up after that finishes.

braingram commented 2 months ago

Regtest run for okifying started here: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST/3038/

braingram commented 2 months ago

The coron3 differences in the above linked run were okified. The unrelated differences were skipped.

braingram commented 2 months ago

coron3 differences are no longer present on github actions regtest run: https://github.com/spacetelescope/RegressionTests/actions/runs/10932644886

spacetelescope / jwst

adjust tolerances for coron registration #8717

Codecov Report