Open huard opened 3 years ago
Second issue relating to unstructured data:
In this example, the 1D coord shared by lat
and lon
is location
and is a list of city names. This makes subset_bbox
fail:
from xclim.testing import open_dataset
from clisops.core.subset import subset_bbox
ds = open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")
subset_bbox(ds, lat_bnds=[44, 46], lon_bnds=[-75, -60])
fails with
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-a0ff1095b7a5> in <module>
1 ds2 = ds.copy()
2 ds2['location'] = [1, 2, 3, 4, 5]
----> 3 subset_bbox(ds, lat_bnds=[44, 46], lon_bnds=[-75, -60])
~/.conda/envs/xclim/lib/python3.8/site-packages/clisops/core/subset.py in func_checker(*args, **kwargs)
257 kwargs[lon][kwargs[lon] < 0] -= 360
258
--> 259 return func(*args, **kwargs)
260
261 return func_checker
~/.conda/envs/xclim/lib/python3.8/site-packages/clisops/core/subset.py in subset_bbox(da, lon_bnds, lat_bnds, start_date, end_date, first_level, last_level)
861 try:
862 coords = da[d][ind[i]]
--> 863 bnds = _check_desc_coords(
864 coord=da[d],
865 bounds=[coords.min().values, coords.max().values],
~/.conda/envs/xclim/lib/python3.8/site-packages/clisops/core/subset.py in _check_desc_coords(coord, bounds, dim)
955 def _check_desc_coords(coord, bounds, dim):
956 """If Dataset coordinates are descending reverse bounds."""
--> 957 if np.all(coord.diff(dim=dim) < 0) and len(coord) > 1:
958 bounds = np.flip(bounds)
959 return bounds
~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/dataarray.py in diff(self, dim, n, label)
3105 DataArray.differentiate
3106 """
-> 3107 ds = self._to_temp_dataset().diff(n=n, dim=dim, label=label)
3108 return self._from_temp_dataset(ds)
3109
~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/dataset.py in diff(self, dim, n, label)
5487 if dim in var.dims:
5488 if name in self.data_vars:
-> 5489 variables[name] = var.isel(**kwargs_end) - var.isel(**kwargs_start)
5490 else:
5491 variables[name] = var.isel(**kwargs_new)
~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/variable.py in func(self, other)
2299 with np.errstate(all="ignore"):
2300 new_data = (
-> 2301 f(self_data, other_data)
2302 if not reflexive
2303 else f(other_data, self_data)
TypeError: unsupported operand type(s) for -: 'str' and 'str'
I edited my comment above because the first part was no more relevant.
Here are details on the top issue:
MWE, same dataset as my comment above:
from xclim.testing import open_dataset
from clisops.core.subset import subset_bbox
ds = open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")
# Trick to avoid error mentioned above.
ds2 = ds.copy()
ds2['location'] = [1, 2, 3, 4, 5]
subset_bbox(ds2, lat_bnds=[45, 50], lon_bnds=[-125, -70])
Those lat and lon bounds are supposed to select locations 2 and 5. As expected, location 1 is dropped. However elements 3 and 4 are still in the output dataset, but with NaNs everywhere, masked instead of dropped.
I think this comes from the fact that the lat/lon bounds are translated to bounds on the shared dim (location
), as would make sense with a curvilinear grid. May be a special case for unstructured grids would be nice (the case when lat and lon are 1D and share the same dimension).
Description
Point observations are usually formatted with a variable defined along
region
andtime
dimensions. Geographical coordinates (lat, lon) for stations are themselves defined alongregion
. With clisops 0.6.3, this data model is wrongly interpreted as a curvilinear grid, and masks stations outside of the bounding box instead of removing them entirely from theregion
dimension.