Closed hdrake closed 4 years ago
Check out this pull request on
You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.
I've now basically implemented the feature requested in https://github.com/hdrake/cmip6hack-multigen/issues/22, i.e. re-factoring of the model preprocessing and loading using general functions.
The full dataset is lazily loaded in as a dictionary of datasets, where each item in the dictionary is an xarray.Dataset
object representing an entire activity_id (which here corresponds to the ensemble used in each individual IPCC report):
ens_dict = pp.load_ensembles(varnames, timeslice=timeslice)
Methods are applied lazily using the dict_func
function as follows:
ens_dict = util.dict_func(ens_dict, xr.Dataset.mean, on_self=True, dim =['time'], keep_attrs=True)
and quantities are eagerly computed by applying the compute method in the same manner:
ens_dict = util.dict_func(ens_dict, xr.Dataset.compute, on_self=True)
This API is kind of clunky and I would eventually like to replace it with an ensemble
object with the following xarray-like API (note that in the backend this is still a dictionary of xarray.Dataset objects – this is important because each IPCC report has a different number of models which also have different names):
ens = ens.mean(dim='time', keep_attrs=True)
ens = ens.compute()
The functions could probably be cleaned up a bit and made more general with a few more optional keyword arguments.
Trying to re-factor code with the temporary goal of reading in all data, computing time-mean, and saving locally in just a few lines.
Work in progress.