hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
96 stars 32 forks source link

mooring (`xoak`) sampling fails when nan data is present in the coordinates #378

Closed Mikejmnez closed 8 months ago

Mikejmnez commented 1 year ago

problem:

With the new (minor) release of scipy (from 1.10 to 1.11), the following line in subsample.mooring_array fails:

ds.xoak.set_index(["XC", "YC"], 'scipy_kdtree')

whenever the coordinates XC and YC have NANs. This happens in the ECCO (and LLC4320) datasets after transforming a subdomain via cutout, whenever the latitudes (YRange) has values above 70N. oceanspy always calls subsample.cutout before extracting a mooring_array, and in the case ECCO and LLC4320 cutout (via llc_rearrange) it is also a necessary (intermediate) step for removing 'face' as a dimension.

scipy_kdtree is just one of many options within the xoak.index_Registry, but other options also have trouble handling nan-ed coordinates.

For versions of scipy < 1.11, (scipy version 1.11 was released June 25 2023) there is no issue with the problem described above.

Minimal example:

The following is part of oceanspy's testing.

import oceanspy as ospy

# open ECCO locally
ECCO_url = "catalog_ECCO.yaml"  # local yaml file that points to ECCO dataset
od = ospy.open_oceandataset.from_catalog("LLC", ECCO_url)

# end points of mooring array
Xmoor = [-80, 0]
Ymoor = [35, 35]

# cutout arguments
args = {
    "XRange": None,
    "YRange": None,
    "add_Hbdr": True,
}
cut_od = od.subsample.cutout(**args)

# visualize cutout
cut_od.plot.horizontal_section(varName='Depth', use_coords=False);

Note that the way XRange and YRange are defined, the entire dataset is transformed before computing the mooring array). The last line plots the variable Depth in index space, seen below

DepthECCO

Lastly, the following line errors with scipy version > 1.10.x

new_od = od.subsample.mooring_array(Xmoor=Xmoor, Ymoor=Ymoor, **args)

The traceback is:

Cutting out the oceandataset.
Warning: This is an experimental feature
faces in the cutout [ 0  1  2  3  4  5  6  7  8  9 10 11 12]
Extracting mooring array.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 new_od = od.subsample.mooring_array(Xmoor=Xmoor, Ymoor=Ymoor, **args)

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/oceanspy/subsample.py:1324, in _subsampleMethods.mooring_array(self, **kwargs)
   1322 @_functools.wraps(mooring_array)
   1323 def mooring_array(self, **kwargs):
-> 1324     return mooring_array(self._od, **kwargs)

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/oceanspy/subsample.py:718, in mooring_array(od, Ymoor, Xmoor, xoak_index, **kwargs)
    711     if xoak_index not in _xoak.IndexRegistry():
    712         raise ValueError(
    713             "`xoak_index` [{}] is not supported."
    714             "\nAvailable options: {}"
    715             "".format(xoak_index, _xoak.IndexRegistry())
    716         )
--> 718     ds_grid.xoak.set_index(["XC", "YC"], xoak_index)
    720 cdata = {"XC": ("mooring", Xmoor), "YC": ("mooring", Ymoor)}
    721 ds_data = _xr.Dataset(cdata)  # mooring data

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:116, in XoakAccessor.set_index(self, coords, index_type, persist, **kwargs)
    114     self._index = XoakIndexWrapper(self._index_type, X, 0, **kwargs)
    115 else:
--> 116     self._index = self._build_index_forest_delayed(X, persist=persist, **kwargs)

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:71, in XoakAccessor._build_index_forest_delayed(self, X, persist, **kwargs)
     68     offset += X.chunks[0][i]
     70 if persist:
---> 71     return dask.persist(*indexes)
     72 else:
     73     return tuple(indexes)

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)
     86     elif isinstance(pool, multiprocessing.pool.Pool):
     87         pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
     90     pool.submit,
     91     pool._max_workers,
     92     dsk,
     93     keys,
     94     cache=cache,
     95     get_id=_thread_get_id,
     96     pack_exception=pack_exception,
     97     **kwargs,
     98 )
    100 # Cleanup pools associated to dead threads
    101 with pools_lock:

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    509         _execute_task(task, data)  # Re-execute locally
    510     else:
--> 511         raise_exception(exc, tb)
    512 res, worker_id = loads(res_info)
    513 state["cache"][key] = res

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/dask/local.py:319, in reraise(exc, tb)
    317 if exc.__traceback__ is not tb:
    318     raise exc.with_traceback(tb)
--> 319 raise exc

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    222 try:
    223     task, data = loads(task_info)
--> 224     result = _execute_task(task, data)
    225     id = get_id()
    226     result = dumps((result, id))

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/xoak/index/base.py:227, in XoakIndexWrapper.__init__(self, index_adapter, points, offset, **kwargs)
    224 index_adapter_cls = normalize_index(index_adapter)
    226 self._index_adapter = index_adapter_cls(**kwargs)
--> 227 self._index = self._index_adapter.build(points)
    228 self._offset = offset

File /opt/anaconda3/envs/Oceanography/lib/python3.10/site-packages/xoak/index/scipy_adapters.py:14, in ScipyKDTreeAdapter.build(self, points)
     13 def build(self, points):
---> 14     return cKDTree(points, **self.index_options)

File _ckdtree.pyx:561, in scipy.spatial._ckdtree.cKDTree.__init__()

ValueError: data must be finite, check for nan or inf values

With scipy version < 1.11 installed (e.g. version 1.10.1), there is no error and indeed you get the following mooring array

XC = new_od._ds['XC'].squeeze()
YC = new_od._ds['YC'].squeeze()
plt.plot(XC, YC);

mooring

Quick Fix:

pin scipy version to < 1.11. That will allow the PR #377 to go through.

General Fix:

Re-write subsample.mooring_array so that it is optional to call cutout before extracting the mooring, whether or not face is a dimension of the dataset. This approach should be the default when face is a dimension of the dataset. This approach will prevent nan-ed values in the coordinates that result from cutout that incorporates near Arctic data.

Luckily I have already been working on this approach (on a separate branch), and such development/enhancement is in its final stage. I will open a new issue describing this new approach that will be the default when face is a dimension of the dataset.