jbusecke / xMIP

Analysis ready CMIP6 data in python the easy way with pangeo tools.
https://cmip6-preprocessing.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
196 stars 44 forks source link

Concatenate experiments with different calendars #228

Open znicholls opened 2 years ago

znicholls commented 2 years ago

This might be out of scope for this package (feel free to close if it is), but sometimes you get experiments with different calendars (e.g. historical uses cftime.DatetimeNoLeap and future uses cftime.DatetimeJulian). If you concatenate these using concat_experiments then you get a complete dataset, but it then explodes if you try to plot. Minimal example to reproduce is below

>>> url = "https://cmip6-pds.s3.amazonaws.com/pangeo-cmip6.json"
>>> col = intake.open_esm_datastore(url)
>>> subset = col.search(
...     variable_id="co2mass",
...     experiment_id=["historical", "ssp370SST-lowNTCF"],
...     source_id="GFDL-ESM4"
... )
>>> datasets = subset.to_dataset_dict(
...     aggregate=False,
...     zarr_kwargs={"consolidated": True, "decode_times": True, "use_cftime": True},
...     progressbar=True,
... )
>>> out = concat_experiments(datasets)
>>> out["GFDL-ESM4.gn.Amon.r1i1p1f1.co2mass"]["co2mass"].plot()
....
ValueError: Calendar units are not all equal.

I'm not sure what the rules with calendars are, but in some cases (e.g. working with monthly data) a simple fix is to just re-write everything in the same calendar by force e.g. ds.assign_coords(time=[cftime.DatetimeGregorian(*v.timetuple()) for v in ds.time.values]). Not sure if such a simple fix belongs here or not thought.

jbusecke commented 2 years ago

No I think this is indeed very appropriate. In fact I have written code to replace the time with a uniform calendar in the drift_removal module. We might want to expose that on a higher level, or maybe even consider unifying calendars (at least for monthly data) at the combined_preprocessing level? I think this was proposed by @aaronspring at some point?

I believe there is now also a sleeker way to do that with either xarray or nctime. So maybe a refactor would be appropriate? I think at the minimum, we could build some checking logic into concat_experiments?

aaronspring commented 2 years ago

In the meantime xarray introduced convert_calendar which might help

znicholls commented 2 years ago

In the meantime xarray introduced convert_calendar which might help

Unfortunately also not happy

>>> out["GFDL-ESM4.gn.Amon.r1i1p1f1.co2mass"].convert_calendar("standard")
...
File src/cftime/_cftime.pyx:1334, in cftime._cftime.datetime.__richcmp__()

TypeError: cannot compare cftime.DatetimeNoLeap(1850, 1, 16, 12, 0, 0, 0, has_year_zero=True) and cftime.DatetimeJulian(2015, 1, 16, 12, 0, 0, 0, has_year_zero=False)
znicholls commented 2 years ago

In fact I have written code to replace the time with a uniform calendar in the drift_removal module

Nice, does this also move the dates? I wasn't completely sure looking at it

jbusecke commented 2 years ago

what do you mean by 'move'? The logic I wrote is basically limited to monthly data, so it is not at all general enough.

It seems like working on the upstream xarray stuff would be the best way forward?

znicholls commented 2 years ago

what do you mean by 'move'?

As in, it renames e.g. 1st Jan 1020 to 1st Jan 1850 and shifts all other time points accordingly?

It seems like working on the upstream xarray stuff would be the best way forward?

👍