hdrake / cmip6hack-multigen

Multi-generational inter-comparison of climate model performance metrics
MIT License
2 stars 4 forks source link

Starting re-factoring code into functions #30

Closed hdrake closed 4 years ago

hdrake commented 4 years ago

Trying to re-factor code with the temporary goal of reading in all data, computing time-mean, and saving locally in just a few lines.

Work in progress.

review-notebook-app[bot] commented 4 years ago

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

hdrake commented 4 years ago

I've now basically implemented the feature requested in https://github.com/hdrake/cmip6hack-multigen/issues/22, i.e. re-factoring of the model preprocessing and loading using general functions.

The full dataset is lazily loaded in as a dictionary of datasets, where each item in the dictionary is an xarray.Dataset object representing an entire activity_id (which here corresponds to the ensemble used in each individual IPCC report):

ens_dict = pp.load_ensembles(varnames, timeslice=timeslice)

Methods are applied lazily using the dict_func function as follows:

ens_dict = util.dict_func(ens_dict, xr.Dataset.mean, on_self=True, dim =['time'], keep_attrs=True)

and quantities are eagerly computed by applying the compute method in the same manner:

ens_dict = util.dict_func(ens_dict, xr.Dataset.compute, on_self=True)

This API is kind of clunky and I would eventually like to replace it with an ensemble object with the following xarray-like API (note that in the backend this is still a dictionary of xarray.Dataset objects – this is important because each IPCC report has a different number of models which also have different names):

ens = ens.mean(dim='time', keep_attrs=True)
ens = ens.compute()

The functions could probably be cleaned up a bit and made more general with a few more optional keyword arguments.