pySTEPS / pysteps

Python framework for short-term ensemble prediction systems.
https://pysteps.github.io/
BSD 3-Clause "New" or "Revised" License
466 stars 168 forks source link

Place ensemble member number determination for blending inside forecast loop to prevent out of memory issues #273

Closed RubenImhoff closed 2 years ago

RubenImhoff commented 2 years ago

In the recently posted release, the blending code determines which NWP models will be combined with which nowcast ensemble members. It does this at once and creates a variable that contains [n_ens_members, n_timesteps, n_cascade_levels, y, x]. For a large number of time steps and ensemble members, this variable can become too big too handle.

To overcome this issue, this PR tries to implement this procedure within the forecast loop (so per time step instead of all at once), which highly reduces the memory requirements.

codecov[bot] commented 2 years ago

Codecov Report

Merging #273 (a6fb7da) into master (2dbd3de) will increase coverage by 0.09%. The diff coverage is 89.47%.

@@            Coverage Diff             @@
##           master     #273      +/-   ##
==========================================
+ Coverage   82.23%   82.32%   +0.09%     
==========================================
  Files         158      158              
  Lines       12117    12130      +13     
==========================================
+ Hits         9964     9986      +22     
+ Misses       2153     2144       -9     
Flag Coverage Δ
unit_tests 82.32% <89.47%> (+0.09%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pysteps/tests/test_blending_steps.py 99.12% <ø> (ø)
pysteps/blending/steps.py 83.73% <89.47%> (-0.17%) :arrow_down:
pysteps/tests/test_exporters.py 100.00% <0.00%> (ø)
pysteps/io/exporters.py 55.12% <0.00%> (+2.77%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2dbd3de...a6fb7da. Read the comment docs.

RubenImhoff commented 2 years ago

Are there critical parts where you would need a more detailed feedback?

Not really, all my own tests seem fine so far. The only small critical point is that it may slow down the forecast loop a bit (but probably not significantly).

one small detail, can you maybe provide a more specific title to this PR?

Will do so!

dnerini commented 2 years ago

one last thought: since you mention the memory usage, would you care to look into a memory profiling for a single run? using memory_profiler would as easy as running

mprof run python <your example script>.py
mprof plot

with this you could easily compare the impact of your new version.

RubenImhoff commented 2 years ago

New code: image

Old code: Out-of-memory error, requested >100 GB according to the command line (I doubt it was that much, but defenitely too much for my laptop)