roocs / clisops

Climate Simulation Operations
https://clisops.readthedocs.io/en/latest/
Other
21 stars 9 forks source link

Error in `average_over_dims`: NotImplementedError: Computing the mean of an array containing cftime.datetime objects is not yet implemented on dask arrays. #185

Open agstephens opened 3 years ago

agstephens commented 3 years ago

Description

Tests demonstrate an error. Computing with ds.mean() over an array containing cftime.datetime objects is not yet implemented on dask arrays.

Googling suggests that one solution is to force the data to be loaded, so it is no longer a delayed dask array, e.g.:

ds = ds.load()

What I Did

These tests demonstrate the error:

============================================================== short test summary info ==============================================================
FAILED tests/ops/test_average.py::test_average_lat_xarray - NotImplementedError: Computing the mean of an array containing cftime.datetime objects...
FAILED tests/ops/test_average.py::test_average_lon_xarray - NotImplementedError: Computing the mean of an array containing cftime.datetime objects...
FAILED tests/ops/test_average.py::test_average_lat_nc - NotImplementedError: Computing the mean of an array containing cftime.datetime objects is ...
FAILED tests/ops/test_average.py::test_average_lon_nc - NotImplementedError: Computing the mean of an array containing cftime.datetime objects is ...
FAILED tests/ops/test_xarray_mean.py::test_xarray_da_mean_keep_attrs_true - NotImplementedError: Computing the mean of an array containing cftime....
======================================== 5 failed, 231 passed, 20 skipped, 330 warnings in 287.45s (0:04:47) ========================================
agstephens commented 3 years ago

Since the average_over_dims operation is not currently used in our production systems (e.g. the rook WPS), the quick fix is just to load() the xr.Dataset before processing. This will not change the functionality. The only risk is that it will attempt to load any size of dataset from disk - so could cause memory issues. This is not a problem at this point in time because we don't use the functionality.

agstephens commented 3 years ago

We can undo this in future when the xarray/pandas versions are brought into line.

aulemahal commented 3 years ago

I opened this issue here : pydata/xarray#5897. I believe the current bug has nothing to do with pandas or a version mismatch, rather then a PR introducing bugs on the xarray side.

All in all, this only happens in the test suite because of the time_bnds variable that is present on some datasets. I would suggest either removing the variable, waiting for xarray or introducing a workaround directly in average_over_dims to skip this faulty variable (which I can send a PR for). Everything but loading data unexpectedly.

ellesmith88 commented 3 years ago

Thanks @aulemahal I've just read this through. We aren't using average_over_dims and we think it will be removed/refactored in the future, so a workaround can be introduced or the tests can be skipped