spacetelescope / stdatamodels

https://stdatamodels.readthedocs.io
Other
5 stars 25 forks source link

add cache for slow hdu indexing #278

Closed braingram closed 6 months ago

braingram commented 7 months ago

Opening a jwst file with many (~550) extensions (level 2 output for nirspec mos mode) take ~9 seconds (when skip_fits_update=True). Opening the same file with skip_fits_update=False takes longer, ~26 seconds. Surprisingly, a large portion of the added time is spent in calls to get_hdu.

Using cProfile and skip_fits_update=False opening now takes ~60 seconds (due to profiling overhead) and the rendered profile (with snakeviz) is as follows:

Screen Shot 2024-03-06 at 10 17 06 AM

zooming into _load_from_schema reveals ~40 seconds in get_hdu

Screen Shot 2024-03-06 at 10 17 57 AM

This PR adds a hdu_cache to skip repeated indexing the hdulist (which in some conditions for this file takes 2-3 ms). With this PR opening the same file with skip_fits_update=False (and no profiling) takes 12 seconds and with skip_fits_update=True still takes 9 seconds (most of this is spent in asdf.open as the tree is quite large). Running cProfile with skip_fits_update=False takes 24 seconds and zooming in to get_hdu reveals 5 seconds spent in get_hdu (20% down from 66% without this PR):

Screen Shot 2024-03-06 at 10 23 19 AM

Checklist

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 64.95%. Comparing base (4d7c3a6) to head (15618ee). Report is 16 commits behind head on main.

:exclamation: Current head 15618ee differs from pull request most recent head ced15e7. Consider uploading reports for the commit ced15e7 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #278 +/- ## ========================================== + Coverage 64.84% 64.95% +0.11% ========================================== Files 103 104 +1 Lines 5694 5718 +24 ========================================== + Hits 3692 3714 +22 - Misses 2002 2004 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

braingram commented 7 months ago

JWST regression tests run with no errors: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/1269/pipeline/199/