Speed up ConstructReliabilityCalibrationTables

metoppv / improver

IMPROVER is a library of algorithms for meteorological post-processing.

http://improver.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

101 stars 84 forks source link

Speed up ConstructReliabilityCalibrationTables #1987

Closed btrotta-bom closed 1 month ago

btrotta-bom commented 3 months ago

Speed up ConstructReliabilityCalibrationTables. When tested on a gridded forecast with 7 reliability bins, the speedup is around 35%. The improved performance is achieved by replacing a loop iterating over the reliability bins with np.searchsorted. The numpy method uses binary search, so has time complexity log(n), where n is the number of reliability bins, whereas the old method is linear in n. So in general we should expect a speedup of around 1-log(n)/n. Also the numpy method is vectorised, whereas the old code uses a python loop, which should give a further improvement.

I have added a test for when the forecast contains nans, since the handling of this is a little trickier than in the old code.

Testing:

[x] Ran tests and they passed OK
[x] Added new tests for the new feature(s)

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 98.41%. Comparing base (96cdbef) to head (22dad9f). Report is 10 commits behind head on master.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #1987 +/- ## ========================================== + Coverage 98.39% 98.41% +0.02% ========================================== Files 124 126 +2 Lines 12209 12403 +194 ========================================== + Hits 12013 12207 +194 Misses 196 196 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 1 month ago

In order to maintain a backlog of relevant PRs, we automatically label them as stale after 60 days of inactivity.

If this PR is still important to you, then please comment on this PR and the stale label will be removed.

Otherwise this PR will be automatically closed in 30 days time.

btrotta-bom commented 1 month ago

this should remain open

btrotta-bom commented 1 month ago

searchsort is mapping each forecast probability to one of the probability bins and returns an index to the associated bin,

put_along_axis is using the bin index for each forecast probability to populate forecast_probabilities with the associated probability value (setting values outside the bin to 0) and forecast_counts with 1 (and again setting values outside the bin to 0).

Yes, correct