pytroll / pyresample

Geospatial image resampling in Python
http://pyresample.readthedocs.org
GNU Lesser General Public License v3.0
344 stars 95 forks source link

Add optional caching to AreaDefinition.get_area_slices #553

Closed djhoese closed 9 months ago

djhoese commented 9 months ago

While profiling some Satpy computations (ABI full disk -> nearest neighbor resampling) I noticed that a decent amount of time at the beginning of processing was spent outside of dask computations and was using a single core. After some print-statement debugging I discovered it was Satpy's reduce_data functionality and the AreaDefinition.get_area_slices that was taking the most time. The majority of that time is spent in the polygon intersection operation to see if the two areas being used intersect and where.

This PR adds a decorator and a couple configuration settings for caching the results of AreaDefinition.get_area_slices to on-disk JSON files. For my testing case I was seeing about ~10-12 seconds being used for get_area_slices per area definition pair. I was using 2 resolutions of ABI data and one target area so that was ~22 seconds. With this caching enabled that time basically disappears.

This PR is only a proof of concept at this point and I will continue to improve it. I just wanted to get the initial commits up on github for others to see.

codecov[bot] commented 9 months ago

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (6a8afc0) 94.11% compared to head (b0a2579) 94.13%.

Files Patch % Lines
pyresample/future/geometry/_subset.py 90.00% 8 Missing :warning:
pyresample/_caching.py 96.10% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #553 +/- ## ========================================== + Coverage 94.11% 94.13% +0.02% ========================================== Files 82 84 +2 Lines 13078 13188 +110 ========================================== + Hits 12308 12415 +107 - Misses 770 773 +3 ``` | [Flag](https://app.codecov.io/gh/pytroll/pyresample/pull/553/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/pytroll/pyresample/pull/553/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll) | `94.13% <94.71%> (+0.02%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

coveralls commented 9 months ago

Coverage Status

coverage: 93.72% (+0.03%) from 93.69% when pulling b0a2579c68604037988b8256e1577973610a3596 on djhoese:cache-area-slices into 6a8afc0085e0b4269f00991ab79fe1b3766bb817 on pytroll:main.

pnuu commented 9 months ago

Timings using Satpy main and this PR. Using gradient_search to load, resample and save 10 composites (some Day/Night, some normal) for FCI L1c to 3868 x 3918 EPSG:3035 area.

reduce_data=True:

As a comparison, with reduce_data=False it takes ~59.7 s (C) to run the same script.

Dask graphs:

A Screenshot 2023-11-20 at 08-36-33 Bokeh Plot

B Screenshot 2023-11-20 at 08-36-49 Bokeh Plot

C Screenshot 2023-11-20 at 08-37-04 Bokeh Plot

mraspaud commented 9 months ago

Thanks @pnuu , this looks good! So I'm merging this.

djhoese commented 9 months ago

Thanks for testing @pnuu!