roocs / clisops

Climate Simulation Operations
https://clisops.readthedocs.io/en/latest/
Other
21 stars 9 forks source link

Error when performingsubset_bbox on dataset that contains multiple timesteps and rotated coordinates #338

Closed MorningGlory747 closed 2 months ago

MorningGlory747 commented 2 months ago

Description

I'm trying to create a spatial subset of an xarray dataset containing rotated coordinates with multiple timesteps. The dataset is from netcdf CaLDAS ECCC data.

What happened: I'm getting the following error when using subset.subset_bboxon the xarray that contains multiple timesteps tmp_bbox = subset.subset_bbox(tmp, lon_bnds=[-76,-72], lat_bnds=[45,50])

>> numpy.core._exceptions._UFuncInputCastingError: Cannot cast ufunc 'less' input 0 from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind'

However, when selecting only one timestep of the dataset tmp_bbox = subset.subset_bbox(tmp.isel(time=0), lon_bnds=[-76,-72], lat_bnds=[45,50])

The subset works just fine.

What I Did

from clisops.core import subset
import xarray as xr

# Reading the dataset
tmp = xr.open_mfdataset(list_of_netcdf_files,concat_dim='time',combine='nested')
print(tmp)
tmp_bbox = subset.subset_bbox(tmp, lon_bnds=[-76,-72], lat_bnds=[45,50])
>>
<xarray.Dataset>
Dimensions:          (time: 5, rlat: 460, rlon: 522)
Coordinates:
    lat              (time, rlat, rlon) float32 46.1 46.09 46.08 ... 48.27 48.26
    lon              (time, rlat, rlon) float32 -82.88 -82.85 ... -60.08 -60.06
  * time             (time) datetime64[ns] 2021-02-02 ... 2021-12-02
  * rlat             (rlat) float32 -2.965 -2.943 -2.92 ... 7.318 7.34 7.363
  * rlon             (rlon) float32 21.47 21.49 21.52 ... 33.15 33.17 33.19
Data variables:
    CaLDAS_P_DN_SFC  (time, rlat, rlon) float32 dask.array<chunksize=(1, 460, 522), meta=np.ndarray>
    CaLDAS_A_SD_Veg  (time, rlat, rlon) float32 dask.array<chunksize=(1, 460, 522), meta=np.ndarray>
    CaLDAS_A_SD_Avg  (time, rlat, rlon) float32 dask.array<chunksize=(1, 460, 522), meta=np.ndarray>
    rotated_pole     (time) float32 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36
Attributes:
    product:      CaLDAS
    Conventions:  CF-1.6
    Remarks:      Variable names are following the convention <Product>_<Type...
    License:      These data are provided by the Canadian Surface Prediction ...

Please let me know if there's anything else I need to add to the post!

Full error code here

Traceback (most recent call last):
  File "C:\Users\...\AppData\Local\miniconda3\envs\DataBIO\lib\site-packages\IPython\core\interactiveshell.py", line 3550, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-8-8237f19c254d>", line 1, in <module>
    tmp_bbox = subset.subset_bbox(tmp, lon_bnds=[-76,-72], lat_bnds=[45,50])
  File "C:\Users\...\AppData\Local\miniconda3\envs\DataBIO\lib\site-packages\clisops\core\subset.py", line 266, in func_checker
    return func(*args, **kwargs)
  File "C:\Users\...\AppData\Local\miniconda3\envs\DataBIO\lib\site-packages\clisops\core\subset.py", line 1282, in subset_bbox
    bnds = _check_desc_coords(
  File "C:\Users\...\AppData\Local\miniconda3\envs\DataBIO\lib\site-packages\clisops\core\subset.py", line 1399, in _check_desc_coords
    if np.all(coord.diff(dim=dim) < 0) and len(coord) > 1 and bounds[1] > bounds[0]:
  File "C:\Users\...\AppData\Local\miniconda3\envs\DataBIO\lib\site-packages\xarray\core\_typed_ops.py", line 281, in __lt__
    return self._binary_op(other, operator.lt)
  File "C:\Users\...\AppData\Local\miniconda3\envs\DataBIO\lib\site-packages\xarray\core\dataarray.py", line 4657, in _binary_op
    f(self.variable, other_variable_or_arraylike)
  File "C:\Users\...\AppData\Local\miniconda3\envs\DataBIO\lib\site-packages\xarray\core\_typed_ops.py", line 597, in __lt__
    return self._binary_op(other, operator.lt)
  File "C:\Users\...\AppData\Local\miniconda3\envs\DataBIO\lib\site-packages\xarray\core\variable.py", line 2414, in _binary_op
    f(self_data, other_data) if not reflexive else f(other_data, self_data)
numpy.core._exceptions._UFuncInputCastingError: Cannot cast ufunc 'less' input 0 from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind
aulemahal commented 2 months ago

Hi @MorningGlory747 !

I think the issue comes from the time dimension on the lon and lat coordinates. clisops does not expect seing a temporal dimension on the coordinates and does not support it. But I am guessing this is just an issue of the way the datasets were combined by xarray. Look into your input data to confirm the lat and lon coordinates are supposed to be the same across all datasets you are combining.

You could do:

tmp = tmp.assign_coords(
    lat=tmp.lat.isel(time=0, drop=True),
    lon=tmp.lat.isel(time=0, drop=True)
)

to get rid of this unwanted time dimension.

MorningGlory747 commented 2 months ago

Hi @aulemahal,

That makes a lot of sense to me and your solution did fix the problem. Thank you very much for this! :)

Curious because I'm still new to manipulating xarray datasets, what would happen if I were to drop other dimensions (say the rlon=0 or the rlat=0) dimensions from lon or lat? Would that essentially be like taking a dataset slice on only that rlon=0 row?

aulemahal commented 2 months ago

Yes. In fact,

tmp = tmp.assign_coords(
    lat=tmp.lat.isel(rlon=0, drop=True),
    lon=tmp.lat.isel(rlon=0, drop=True)
)

would probably break your dataset because you would now be missing information : the latitude axis is sliced along the rlon dimension, but the data is not. Xarray would assume that all lat/lon coordinates are the same for each rlon index, which is not true. Reprojection with cartopy or xesmf would either fail or yield strange results.