Closed raehik closed 9 months ago
Using the previous xr.map_blocks
method for computing forcings doesn't work either. It seems to function, but spends a minute at 40% and eats all my memory.
t = surface_fields.coarsen({"xu_ocean": options.factor, "yu_ocean": options.factor}, boundary="trim").mean()
t2 = t.copy()
t2 = t2.rename({"usurf": "S_x", "vsurf": "S_y"})
t = xr.merge((t, t2))
forcings = xr.map_blocks(lambda x: lib.compute_forcings_and_coarsen_cm2_6(x, grid, options.factor), surface_fields, template=t)
This has to be some Dask scheduling issue. An xarray doc page gives some tips: https://xarray.pydata.org/en/v0.9.3/dask.html . It has me wondering if some small change to building the array calculations might be breaking performance. (It would explain the random breaking.)
With more fiddling (refreshing dependencies, using 1 Dask worker, paranoid chunking tweaks), I've succesfully generated forcings using map_blocks
. Indeed map_blocks
is usually present to ensure good Dask scheduling, rather than obstructing it like we thought in #47 . --ntimes 50
with paper spatial domain with dask.config.set(num_workers=1)
had me use max ~4GB RAM -- much more reasonable.
Dask seems to scale its workers with number of detected cores. My poor laptop then gets hammered by 16 parallel threads, each apparently needing 4GB RAM (?). This would explain the crashing on CSD3: adding cores adds RAM but also adds more parallel workers, so Dask continues OOMing itself.
Fixed in #97 by returning to map_blocks
and controlling Dask workers. (Maybe limiting Dask workers is the only thing that matters...?)
I still get surprising memory usage. --ntimes 100
uses double max memory than --ntimes 50
, though my understanding is that it shouldn't really take much more. Same scaling factor for both.
Closing as fixed, though not at the source. Tracking unexpected memory usage/scaling in #113 .
Using the #97 branch, I can't generate even 50 time points during forcing generation without OOMing.
On my machine with ~20GB RAM available:
--ntimes 50
gets about 15% through and OOMs--ntimes 25
executes with peak memory usage ~15 GB