spencerahill / aospy

Python package for automated analysis and management of gridded climate data
Apache License 2.0
83 stars 12 forks source link

Failing test_calc_basic tests after updating xarray from 10.2 to 10.3 #268

Closed spencerahill closed 6 years ago

spencerahill commented 6 years ago

Originally noted here

All of the failures are the same error. Note that this one doesn't involve regional averaging, so it's unrelated to what I'm implementing in #266. Here's the traceback of one:

$ py.test test_calc_basic.py::TestCalc3D::test_monthly_ts
================================================================== test session starts ==================================================================
platform darwin -- Python 3.6.3, pytest-3.2.3, py-1.5.1, pluggy-0.4.0
rootdir: /Users/shill/Dropbox/py/aospy, inifile: setup.cfg
plugins: catchlog-1.2.2, hypothesis-3.50.2
collected 1 item

test_calc_basic.py F

======================================================================= FAILURES ========================================================================
______________________________________________________________ TestCalc3D.test_monthly_ts _______________________________________________________________

self = <aospy.test.test_calc_basic.TestCalc3D testMethod=test_monthly_ts>

    def test_monthly_ts(self):
        calc = Calc(intvl_out=1, dtype_out_time='ts', **self.test_params)
>       calc.compute()

calc       = <aospy.Calc instance: sphum, example_proj, example_model, example_run>
self       = <aospy.test.test_calc_basic.TestCalc3D testMethod=test_monthly_ts>

test_calc_basic.py:88:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../calc.py:569: in compute
    self.end_date),
../calc.py:415: in _get_all_data
    for n, var in enumerate(self.variables)]
../calc.py:415: in <listcomp>
    for n, var in enumerate(self.variables)]
../calc.py:367: in _get_input_data
    **self.data_loader_attrs)
../data_loader.py:278: in load_variable
    ds, min_year, max_year = _prep_time_data(ds)
../data_loader.py:180: in _prep_time_data
    ds = times.ensure_time_avg_has_cf_metadata(ds)
../utils/times.py:417: in ensure_time_avg_has_cf_metadata
    raw_start_date = ds[TIME_BOUNDS_STR].isel(**{TIME_STR: 0, BOUNDS_STR: 0})
../../../../miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/dataarray.py:754: in isel
    ds = self._to_temp_dataset().isel(drop=drop, **indexers)
../../../../miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/dataset.py:1391: in isel
    indexers_list = self._validate_indexers(indexers)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <xarray.Dataset>
Dimensions:       (bounds: 2)
Coordinates:
    time_bounds   (bounds) float64 dask.array<shape=(2,), ...    time_weights  float64 ...
Data variables:
    <this-array>  (bounds) float64 dask.array<shape=(2,), chunksize=(2,)>
indexers = {'bounds': 0, 'time': 0}

    def _validate_indexers(self, indexers):
        """ Here we make sure
            + indexer has a valid keys
            + indexer is in a valid data type
            """
        from .dataarray import DataArray

        invalid = [k for k in indexers if k not in self.dims]
        if invalid:
>           raise ValueError("dimensions %r do not exist" % invalid)
E           ValueError: dimensions ['time'] do not exist

DataArray  = <class 'xarray.core.dataarray.DataArray'>
indexers   = {'bounds': 0, 'time': 0}
invalid    = ['time']
self       = <xarray.Dataset>
Dimensions:       (bounds: 2)
Coordinates:
    time_bounds   (bounds) float64 dask.array<shape=(2,), ...    time_weights  float64 ...
Data variables:
    <this-array>  (bounds) float64 dask.array<shape=(2,), chunksize=(2,)>
spencerahill commented 6 years ago

Basically in the failing tests the loaded dataset's time bounds array is not indexed by the time array:

pp ds
<xarray.Dataset>
Dimensions:       (bounds: 2, lat: 64, lat_bounds: 65, lon: 128, lon_bounds: 129, pfull: 30, phalf: 31, time: 1)
Coordinates:
  * lon_bounds    (lon_bounds) float64 -1.406 1.406 4.219 7.031 9.844 12.66 ...
  * lon           (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 16.88 ...
  * lat_bounds    (lat_bounds) float64 -90.0 -86.58 -83.76 -80.96 -78.16 ...
  * lat           (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 -73.95 ...
  * phalf         (phalf) float64 0.0 9.202 12.44 16.66 22.07 28.97 37.63 ...
    bk            (phalf) float32 dask.array<shape=(31,), chunksize=(31,)>
    pk            (phalf) float32 dask.array<shape=(31,), chunksize=(31,)>
  * pfull         (pfull) float64 3.385 10.78 14.5 19.3 25.44 33.2 42.9 ...
    time_bounds   (bounds) float64 dask.array<shape=(2,), chunksize=(2,)>
  * bounds        (bounds) float64 1.0 2.0
    time_weights  float64 ...
  * time          (time) float64 1.841e+03
Data variables:
    ps            (lat, lon) float32 dask.array<shape=(64, 128), chunksize=(64, 128)>
    sphum         (pfull, lat, lon) float32 dask.array<shape=(30, 64, 128), chunksize=(30, 64, 128)>
Attributes:
    coordinates:  time

This is probably related to the fact that the time array is length-1: all of the failures are for the TestCalc3D class, which has only one year of data.

Looking at the xarray 10.3 what's new, here's a potential culprit: https://github.com/pydata/xarray/pull/2048

spencerahill commented 6 years ago

I have to run for now, but it's something in our preprocess func that's causing the problem:

(Pdb) xr.open_mfdataset(file_set, concat_dim='time')['time_bounds']
<xarray.DataArray 'time_bounds' (time: 1, nv: 2)>
dask.array<shape=(1, 2), dtype=timedelta64[ns], chunksize=(1, 2)>
Coordinates:
  * nv       (nv) float64 1.0 2.0
  * time     (time) object    6-01-17 00:00:00
Attributes:
    long_name:  time axis boundaries
(Pdb) xr.open_mfdataset(file_set, preprocess=func, concat_dim='time')['time_bounds']
<xarray.DataArray 'time_bounds' (bounds: 2)>
array([157680000000000000, 160358400000000000], dtype='timedelta64[ns]')
Coordinates:
    time_bounds   (bounds) timedelta64[ns] 1825 days 1856 days
  * bounds        (bounds) float64 1.0 2.0
    time_weights  timedelta64[ns] 31 days
Attributes:
    long_name:  time axis boundaries
spencerkclark commented 6 years ago

Upon looking at things more closely, this is failing in _prep_time_data at the times.ensure_time_avg_has_cf_metadata(ds) step. The existing line above this step is a workaround meant to address this very issue: https://github.com/spencerahill/aospy/blob/f240c72e88e70f771305f44a14907929c6ddacb2/aospy/data_loader.py#L177-L178 I think times.ensure_time_as_dim was written before expand_dims existed in Xarray, so the logic there might be overdue for a clean up. Could that be where things are going wrong? Maybe focus your attention there in #269?