Forcing generation (see lib.data.compute_forcings_and_coarsen_cm2_6()) is done per time point, independent of any other time point. We operate on "lazy" Dask arrays, which only download backing data when scheduled and can stream outputs out to file.
Since we don't need to hold forcings in memory after calculation (we can just write them to file), we should be able to change --ntimes timepoints we compute forcings for without largely impacting memory usage. But that doesn't appear to be the case. When testing with a single Dask worker, peak memory usage roughly doubled between --ntimes 50 and --ntimes 100.
Note that this "should" is reliant on Dask scheduling operations efficiently, which may not be a guarantee. A user can guide it in a few ways. See #107 , where this cropped up.
Forcing generation (see
lib.data.compute_forcings_and_coarsen_cm2_6()
) is done per time point, independent of any other time point. We operate on "lazy" Dask arrays, which only download backing data when scheduled and can stream outputs out to file.Since we don't need to hold forcings in memory after calculation (we can just write them to file), we should be able to change
--ntimes
timepoints we compute forcings for without largely impacting memory usage. But that doesn't appear to be the case. When testing with a single Dask worker, peak memory usage roughly doubled between--ntimes 50
and--ntimes 100
.Note that this "should" is reliant on Dask scheduling operations efficiently, which may not be a guarantee. A user can guide it in a few ways. See #107 , where this cropped up.