hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
98 stars 32 forks source link

Indexing with a boolean dask array is not allowed... #332

Closed Mikejmnez closed 1 year ago

Mikejmnez commented 1 year ago

Description

xarray version 2023.3.

Simplest example to reproduce the error: Live demo notebook in binder:

import oceanspy as ospy
import xarray as xr
od = ospy.open_oceandataset.from_zarr("OSM2020_EGshelfIIseas2km_ERAI_1D")  # data stored in binder

# define mooring
lats_Kogur = [68.68, 67.52, 66.49]
lons_Kogur = [-26.28, -23.77, -22.99]

# Extract mooring array
od_moor = od.subsample.mooring_array(Xmoor=lons_Kogur, Ymoor=lats_Kogur)

The last line fails to run. The (relevant) traceback:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/oceanspy/subsample.py:399, in cutout(od, varList, YRange, XRange, add_Hbdr, mask_outside, ZRange, add_Vbdr, timeRange, timeFreq, sampMethod, dropAxes, centered, chunks, persist, geo_true)
    397 if XRange is not None or YRange is not None:
    398     XRange, ref_lon = _reset_range(XRange)
--> 399     maskH, dmaskH, XRange, YRange = get_maskH(
    400         ds, add_Hbdr, XRange, YRange, ref_lon=ref_lon
    401     )
    403     dYp1 = dmaskH["Yp1"].values
    404     dXp1 = dmaskH["Xp1"].values

File /srv/conda/envs/notebook/lib/python3.10/site-packages/oceanspy/utils.py:786, in get_maskH(ds, add_Hbdr, XRange, YRange, ref_lon)
    782 # Find horizontal indexes
    783 maskH = maskH.assign_coords(
    784     Yp1=_np.arange(len(maskH["Yp1"])), Xp1=_np.arange(len(maskH["Xp1"]))
    785 )
--> 786 dmaskH = maskH.where(maskH, drop=True)
    787 return maskH, dmaskH, XRange, YRange

the error being:

KeyError: 'Indexing with a boolean dask array is not allowed. This will result in a dask array of unknown shape. Such arrays are unsupported by Xarray.Please compute the indexer first using .compute()'

Quick fix for now,

pin xarray to version <= 2023.2.

This might be a good opportunity to see if we can start using xoak to find indexes instead .

Mikejmnez commented 1 year ago

FYI the current Oceanography image on Sciserver has xarray = 2023.2, for which there is no trouble.

Mikejmnez commented 1 year ago

More context on the erro (see utils.py lines 767 + ):

maskH = _xr.ones_like(ds["YG"])

maskH = maskH.where( _np.logical_and(ds["YG"] >= YRange[0],  ds["YG"] <= YRange[-1]), 0)

maskH = maskH.assign_coords(Yp1=_np.arange(len(maskH["Yp1"])), Xp1=_np.arange(len(maskH["Xp1"])))

dmaskH = maskH.where(maskH, drop=True)

The last line is being flagged as error in xarray 2023.3. In order to have tests passing, we have to:

dmaskH = maskH.where(maskH.compute(), drop=True)
Mikejmnez commented 1 year ago

PR #334 introduces a workaround to this issue.

I am keeping this issue open, as we need to investigate whether mask.compute() is a good idea or not in terms of performance (perhaps there isn't much loss). Or, perhaps we need to rethink how to mask the coordinates.

malmans2 commented 1 year ago

I think compute is OK here. drop=True silently does compute (or at least used to), as it needs to evaluate which data can be dropped.

Mikejmnez commented 1 year ago

That is reassuring. I was a little bit worried that .compute() would be costly in terms of performance for grids like LLC4320 (in the case a cutout region is close to the entire domain). I am looking into this. It may lead to O(1-3) minutes at worst (although that may depend on the current Sciserver usage). More on this to come