hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
101 stars 32 forks source link

`cutout` fails with Arctic face #425

Open ThomasHaine opened 6 months ago

ThomasHaine commented 6 months ago

The new cutout code (see #408) fails when an Arctic face is included, but works OK without an Arctic face.

Run the following code on SciServer with/without the Arctic face (Gulf Stream cutout/Labrador Sea cutout). OceanSpy must be updated to the most recent version.

import oceanspy as ospy
print(ospy.__version__)
from oceanspy.utils import viewer2range
od = ospy.open_oceandataset.from_catalog('ECCO')

# This cutout is the Labrador Sea.  THIS FAILS
# p1 = {"type":"FeatureCollection","features":[{"type":"Feature","properties":{"timeFrom":"2012-04-25T00:00:00.000Z","timeTo":"2012-05-04T23:00:00.000Z"},"geometry":{"type":"Polygon","coordinates":[[[-64.44133757068391,61.173252690828264],[-49.05704496814485,61.37698858583724],[-40.9002113194632,59.15040982649168],[-41.38412474109407,52.75702179091084],[-55.70085524036165,52.750580205796524],[-64.02471540881868,56.42738291739511],[-64.44133757068391,61.173252690828264]]]}}]}

# This cutout is the Gulf Stream. THIS WORKS
p1 = {"type":"FeatureCollection","features":[{"type":"Feature","properties":{"timeFrom":"2012-04-25T00:00:00.000Z","timeTo":"2012-05-04T23:00:00.000Z"},"geometry":{"type":"Polygon","coordinates":[[[-78.88224245158527,33.55059287100124],[-76.14819170085286,31.306244597538296],[-71.76971078959798,33.7382292505979],[-74.98733163920734,36.61005622473054],[-78.88224245158527,33.55059287100124]]]}}]}

this_time, lats, lons = viewer2range(p1)
cutout_kwargs = {
    'varList': ['SSH', 'THETA'],
    'timeRange': this_time,
    'XRange': lons, 
    'YRange': lats, 
    'add_Hbdr': True,
}
cut_od = od.subsample.cutout(**cutout_kwargs)

The error traceback for the Labrador Sea cutout is:

0.3.6.dev15+g9e566fc
Opening ECCO.
ECCO v4r4 3D dataset, ocean simulations on LLC90 grid (monthly mean output)
extracting Polygon
Cutting out the oceandataset.
Warning: This is an experimental feature
faces in the cutout [2, 6, 10]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[3], line 20
     12 this_time, lats, lons = viewer2range(p1)
     13 cutout_kwargs = {
     14     'varList': ['SSH', 'THETA'],
     15     'timeRange': this_time,
   (...)
     18     'add_Hbdr': True,
     19 }
---> 20 cut_od = od.subsample.cutout(**cutout_kwargs)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/subsample.py:1561, in _subsampleMethods.cutout(self, **kwargs)
   1559 @_functools.wraps(cutout)
   1560 def cutout(self, **kwargs):
-> 1561     return cutout(self._od, **kwargs)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/subsample.py:384, in cutout(od, varList, YRange, XRange, add_Hbdr, mask_outside, ZRange, add_Vbdr, timeRange, timeFreq, sampMethod, dropAxes, centered, persist)
    374 if "face" in ds.dims:
    375     arg = {
    376         "ds": ds,
    377         "varList": varList,  # vars and grid coords to transform
   (...)
    382         "persist": persist,
    383     }
--> 384     dsnew = _llc_trans.arctic_crown(**arg)
    385     dsnew = dsnew.set_coords(co_list)
    387     grid_coords = {
    388         "Y": {"Y": None, "Yp1": 0.5},
    389         "X": {"X": None, "Xp1": 0.5},
    390         "Z": {"Z": None, "Zp1": 0.5, "Zu": 0.5, "Zl": -0.5},
    391         "time": {"time": -0.5},
    392     }

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/llc_rearrange.py:217, in LLCtransformation.arctic_crown(self, ds, varList, YRange, XRange, add_Hbdr, faces, centered, persist)
    213     DSa10 = 0
    215 DSa7 = shift_dataset(DSa7, dims_c.X, dims_g.X)
--> 217 DSa10 = shift_dataset(DSa10, dims_c.Y, dims_g.Y)
    218 DSa10 = rotate_dataset(DSa10, dims_c, dims_g, rev_x=False, rev_y=True)
    219 DSa10 = rotate_vars(DSa10)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/llc_rearrange.py:690, in shift_dataset(_ds, dims_c, dims_g)
    688 _ds = _copy.deepcopy(_ds)
    689 for _dim in [dims_c, dims_g]:
--> 690     if int(_ds[_dim][0].data) < int(_ds[_dim][1].data):
    691         _ds["n" + _dim] = _ds[_dim] - int(_ds[_dim][0].data)
    692         _ds = (
    693             _ds.swap_dims({_dim: "n" + _dim})
    694             .drop_vars([_dim])
    695             .rename({"n" + _dim: _dim})
    696         )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/dataarray.py:823, in DataArray.__getitem__(self, key)
    820     return self._getitem_coord(key)
    821 else:
    822     # xarray-style array indexing
--> 823     return self.isel(indexers=self._item_key_to_dict(key))

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/dataarray.py:1421, in DataArray.isel(self, indexers, drop, missing_dims, **indexers_kwargs)
   1416     return self._from_temp_dataset(ds)
   1418 # Much faster algorithm for when all indexers are ints, slices, one-dimensional
   1419 # lists, or zero or one-dimensional np.ndarray's
-> 1421 variable = self._variable.isel(indexers, missing_dims=missing_dims)
   1422 indexes, index_variables = isel_indexes(self.xindexes, indexers)
   1424 coords = {}

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/variable.py:1378, in Variable.isel(self, indexers, missing_dims, **indexers_kwargs)
   1375 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
   1377 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims)
-> 1378 return self[key]

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/variable.py:900, in Variable.__getitem__(self, key)
    887 """Return a new Variable object whose contents are consistent with
    888 getting the provided key from the underlying data.
    889 
   (...)
    897 array `x.values` directly.
    898 """
    899 dims, indexer, new_order = self._broadcast_indexes(key)
--> 900 data = as_indexable(self._data)[indexer]
    901 if new_order:
    902     data = np.moveaxis(data, range(len(new_order)), new_order)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/indexing.py:1544, in PandasIndexingAdapter.__getitem__(self, indexer)
   1541 if getattr(key, "ndim", 0) > 1:  # Return np-array if multidimensional
   1542     return NumpyIndexingAdapter(np.asarray(self))[indexer]
-> 1544 result = self.array[key]
   1546 if isinstance(result, pd.Index):
   1547     return type(self)(result, dtype=self.dtype)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/pandas/core/indexes/base.py:5175, in Index.__getitem__(self, key)
   5172 if is_integer(key) or is_float(key):
   5173     # GH#44051 exclude bool, which would return a 2d ndarray
   5174     key = com.cast_scalar_indexer(key)
-> 5175     return getitem(key)
   5177 if isinstance(key, slice):
   5178     # This case is separated from the conditional above to avoid
   5179     # pessimization com.is_bool_indexer and ndim checks.
   5180     result = getitem(key)

IndexError: index 1 is out of bounds for axis 0 with size 1
Mikejmnez commented 6 months ago

Hey @ThomasHaine thanks for the detailed error. I gave it a very quick glance. It is possible that the error is because there is so little data (perhaps 1 grid point) that is being retained from the arctic face, or face 2, and that sometime causes errors in cutout... One thing that you can play with is remove the add_hbdr option from your cutout_kwargs. add_hbdr adds a buffer layer around the limits of your cutout, effectively making if larger. I believe for ECCO its like 1 deg or 2 in each direction, which equals about one grid point or two...

Another is to manually increase the your domain of interest by more that 1 or 2 degrees. I tried it and it worked fine. This is what I did:

## case 1 ) same domain w/o buffer layer
cutout_kwargs1 = {
    'varList': ['U','V', 'T'],
    'timeRange': ["1992-01-16T12:00:00.000000000"],
    'XRange': [np.min(lons), np.max(lons)], 
    'YRange': [np.min(lats), np.max(lats)], 
}

## case 2) Manually increase domain in north direction
cutout_kwargs2 = {
    'varList': ['U','V', 'T'],
    'timeRange': ["1992-01-16T12:00:00.000000000"],
    'XRange': [np.min(lons), np.max(lons)], 
    'YRange': [np.min(lats), np.max(lats)+6], 
}

cut_od1 = ECCOod.subsample.cutout(**cutout_kwargs1)

cut_od2 = ECCOod.subsample.cutout(**cutout_kwargs2)

# on a separate cell:
fig, axes = plt.subplots(1, 2, figsize=(14, 4))
cut_od1._ds['T'].isel(time=0, Z=0).plot(ax=axes[0])
cut_od2._ds['T'].isel(time=0, Z=0).plot(ax=axes[1])

Runs fine... the plots are:

Screenshot 2024-03-27 at 9 30 44 AM

Note

Incut_od1, which contains your domain of interest unexpanded, the data all lies within face=10. Which points to the original error likely being about 1 gridpoint or so retain from face 6, and potentially another point retained from face 2, when the buffer layer is added. The faces involved in your original cutout were [2, 6, 10].

ThomasHaine commented 6 months ago

Thanks @Mikejmnez ! I agree with your analysis. It's interesting.

Can we trap the condition in cutout and give an informative error message?

Mikejmnez commented 6 months ago

Sure! llc_rearrange.py is long due for a refactor. In particular cutout could use some of the functions I wrote for computing mooring arrays "serially" that determine the index location at the edges between faces. That would make cutout less error prune and more exact.