Forcing generation is too memory hungry again

m2lines / gz21_ocean_momentum

Stochastic-Deep Learning Parameterization of Ocean Momentum Forcing

MIT License

5 stars 1 forks source link

Forcing generation is too memory hungry again #107

Closed raehik closed 9 months ago

raehik commented 9 months ago

Using the #97 branch, I can't generate even 50 time points during forcing generation without OOMing.

python src/gz21_ocean_momentum/cli/data.py \
--config-file resources/cli-configs/data-paper.yaml --verbose

On my machine with ~20GB RAM available:

--ntimes 50 gets about 15% through and OOMs
--ntimes 25 executes with peak memory usage ~15 GB

raehik commented 9 months ago

Using the previous xr.map_blocks method for computing forcings doesn't work either. It seems to function, but spends a minute at 40% and eats all my memory.

t = surface_fields.coarsen({"xu_ocean": options.factor, "yu_ocean": options.factor}, boundary="trim").mean()
t2 = t.copy()
t2 = t2.rename({"usurf": "S_x", "vsurf": "S_y"})
t = xr.merge((t, t2))
forcings = xr.map_blocks(lambda x: lib.compute_forcings_and_coarsen_cm2_6(x, grid, options.factor), surface_fields, template=t)

raehik commented 9 months ago

This has to be some Dask scheduling issue. An xarray doc page gives some tips: https://xarray.pydata.org/en/v0.9.3/dask.html . It has me wondering if some small change to building the array calculations might be breaking performance. (It would explain the random breaking.)

raehik commented 9 months ago

More links:

https://docs.dask.org/en/stable/scheduler-overview.html
https://berkeley-scf.github.io/tutorial-dask-future/python-dask.html
https://docs.dask.org/en/stable/best-practices.html (note map_blocks recommendation)
https://dask.pydata.org/en/latest/delayed.html

raehik commented 9 months ago

With more fiddling (refreshing dependencies, using 1 Dask worker, paranoid chunking tweaks), I've succesfully generated forcings using map_blocks. Indeed map_blocks is usually present to ensure good Dask scheduling, rather than obstructing it like we thought in #47 . --ntimes 50 with paper spatial domain with dask.config.set(num_workers=1) had me use max ~4GB RAM -- much more reasonable.

raehik commented 9 months ago

Dask seems to scale its workers with number of detected cores. My poor laptop then gets hammered by 16 parallel threads, each apparently needing 4GB RAM (?). This would explain the crashing on CSD3: adding cores adds RAM but also adds more parallel workers, so Dask continues OOMing itself.

raehik commented 9 months ago

Fixed in #97 by returning to map_blocks and controlling Dask workers. (Maybe limiting Dask workers is the only thing that matters...?)

I still get surprising memory usage. --ntimes 100 uses double max memory than --ntimes 50, though my understanding is that it shouldn't really take much more. Same scaling factor for both.

raehik commented 9 months ago

Closing as fixed, though not at the source. Tracking unexpected memory usage/scaling in #113 .