pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.57k stars 1.07k forks source link

Selecting time with different variable and dimensions names #1474

Closed snowman2 closed 7 years ago

snowman2 commented 7 years ago

I am having trouble selecting time from this DataArray (Notice that the dimension is 'Time' and the variable/coordinate is 'Times':

<xarray.DataArray 'RAINC' (Time: 16, south_north: 5, west_east: 5)>
dask.array<getitem, shape=(16, 5, 5), dtype=float32, chunksize=(1, 5, 5)>
Coordinates:
    XLAT     (south_north, west_east) float32 40.3474 40.3502 40.3529 ...
    XLONG    (south_north, west_east) float32 -111.749 -111.679 -111.608 ...
    Times    (Time) datetime64[ns] 2016-08-23T22:00:00 2016-08-23T23:00:00 ...
Dimensions without coordinates: Time, south_north, west_east
Attributes:
    FieldType:    104
    MemoryOrder:  XY 
    description:  ACCUMULATED TOTAL CUMULUS PRECIPITATION
    units:        mm
    stagger:      
    coordinates:  XLONG XLAT XTIME

I have tried several different methods:

data = data[{self.lsm_time_dim: [pd.to_datetime(time_step)]}]
data = data[{self.lsm_time_dim: pd.to_datetime(time_step)}]
data = data[{self.lsm_time_dim: str(time_step)}]

And they all end with a similar error:

../gsshapy/grid/grid_to_gssha.py:634: in _load_lsm_data
    data = data[{self.lsm_time_dim: [pd.to_datetime(time_step)]}]
../../../tethys/miniconda/envs/gssha/lib/python3.6/site-packages/xarray/core/dataarray.py:472: in __getitem__
    return self.isel(**self._item_key_to_dict(key))
../../../tethys/miniconda/envs/gssha/lib/python3.6/site-packages/xarray/core/dataarray.py:679: in isel
    ds = self._to_temp_dataset().isel(drop=drop, **indexers)
../../../tethys/miniconda/envs/gssha/lib/python3.6/site-packages/xarray/core/dataset.py:1143: in isel
    new_var = var.isel(**var_indexers)
../../../tethys/miniconda/envs/gssha/lib/python3.6/site-packages/xarray/core/variable.py:570: in isel
    return self[tuple(key)]
../../../tethys/miniconda/envs/gssha/lib/python3.6/site-packages/xarray/core/variable.py:400: in __getitem__
    values = self._indexable_data[key]
../../../tethys/miniconda/envs/gssha/lib/python3.6/site-packages/xarray/core/indexing.py:545: in __getitem__
    result = self.array[key]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DatetimeIndex(['2016-08-23 22:00:00', '2016-08-23 23:00:00',
               '2016-08-24 00:00:00', '2016-08-24 01:00:0...:00:00',
               '2016-08-24 12:00:00', '2016-08-24 13:00:00'],
              dtype='datetime64[ns]', freq=None)
key = array([Timestamp('2016-08-23 22:00:00')], dtype=object)

    def __getitem__(self, key):
        """
            This getitem defers to the underlying array, which by-definition can
            only handle list-likes, slices, and integer scalars
            """

        is_int = is_integer(key)
        if is_scalar(key) and not is_int:
            raise ValueError

        getitem = self._data.__getitem__
        if is_int:
            val = getitem(key)
            return self._box_func(val)
        else:
            if com.is_bool_indexer(key):
                key = np.asarray(key)
                if key.all():
                    key = slice(0, None, None)
                else:
                    key = lib.maybe_booleans_to_slice(key.view(np.uint8))

            attribs = self._get_attributes_dict()

            is_period = isinstance(self, ABCPeriodIndex)
            if is_period:
                freq = self.freq
            else:
                freq = None
                if isinstance(key, slice):
                    if self.freq is not None and key.step is not None:
                        freq = key.step * self.freq
                    else:
                        freq = self.freq

            attribs['freq'] = freq

>           result = getitem(key)
E           IndexError: arrays used as indices must be of integer (or boolean) type
fmaussion commented 7 years ago

Somewhat not a direct answer to your question, but I see you are reading in WRF files. Have you considered using Salem's open_wrf_dataset on your files? It will solve some time related issues that these files have

snowman2 commented 7 years ago

I am having the same issue with a completely different grid:

<xarray.DataArray 'sp' (time: 72, latitude: 5, longitude: 4)>
dask.array<getitem, shape=(72, 5, 4), dtype=float64, chunksize=(24, 5, 4)>
Coordinates:
  * longitude  (longitude) float32 248.0 248.25 248.5 248.75
  * latitude   (latitude) float32 40.75 40.5 40.25 40.0 39.75
  * time       (time) datetime64[ns] 2016-01-02 2016-01-02T01:00:00 ...
Attributes:
    units:          Pa
    long_name:      Surface pressure
    standard_name:  surface_air_pressure

Error:

self = DatetimeIndex(['2016-01-02 00:00:00', '2016-01-02 01:00:00',
               '2016-01-02 02:00:00', '2016-01-02 03:00:0...:00:00',
               '2016-01-04 22:00:00', '2016-01-04 23:00:00'],
              dtype='datetime64[ns]', freq=None)
key = array(Timestamp('2016-01-02 00:00:00'), dtype=object)

    def __getitem__(self, key):
        """
            This getitem defers to the underlying array, which by-definition can
            only handle list-likes, slices, and integer scalars
            """

        is_int = is_integer(key)
        if is_scalar(key) and not is_int:
            raise ValueError

        getitem = self._data.__getitem__
        if is_int:
            val = getitem(key)
            return self._box_func(val)
        else:
            if com.is_bool_indexer(key):
                key = np.asarray(key)
                if key.all():
                    key = slice(0, None, None)
                else:
                    key = lib.maybe_booleans_to_slice(key.view(np.uint8))

            attribs = self._get_attributes_dict()

            is_period = isinstance(self, ABCPeriodIndex)
            if is_period:
                freq = self.freq
            else:
                freq = None
                if isinstance(key, slice):
                    if self.freq is not None and key.step is not None:
                        freq = key.step * self.freq
                    else:
                        freq = self.freq

            attribs['freq'] = freq

>           result = getitem(key)
E           IndexError: arrays used as indices must be of integer (or boolean) type

../../../tethys/miniconda/envs/gssha/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py:296: IndexError
fmaussion commented 7 years ago

have you tried the way it is done in the docs?

http://xarray.pydata.org/en/latest/indexing.html#indexing-with-labeled-dimensions

snowman2 commented 7 years ago

OK. I feel silly now. I thought I should have been using loc[] Thanks @fmaussion for that.

fmaussion commented 7 years ago

I personally very much prefer the readability of .sel and .isel, which is what I teach to my students.

I don't know however what's most used amongst other xarray users

snowman2 commented 7 years ago

OK. That is good to know. In this case, I don't know the name if the dimension beforehand, so I thought it might be preferable to do:

data = data.loc[{self.lsm_time_dim: [pd.to_datetime(time_step)]}]

instead of:

data = data.sel(**{self.lsm_time_dim: [pd.to_datetime(time_step)]})

What are your thoughts?

fmaussion commented 7 years ago

In this case, I don't know the name if the dimension beforehand

Yes, in this case definitely a good use case for a dict and .loc

snowman2 commented 7 years ago

Oh, and to make the time slice work for WRF, I had to make the time variables consistent:

    xds.rename(
        {
            time_dim: 'time',
            time_var: 'time',
        },
        inplace=True
    )
snowman2 commented 7 years ago

Which is what I think you do in salem.

snowman2 commented 7 years ago

@fmaussion, thanks again for your time!