hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
96 stars 32 forks source link

oceanspy.utils.great_circle_path fails to find intermediate coordinate points with LLC grids? #346

Closed Mikejmnez closed 1 year ago

Mikejmnez commented 1 year ago

Description

@ThomasHaine tried to compute a mooring section on LLC4320 using Kogur lats and lons from the documentation, and ran into the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[100], line 1
----> 1 od_moor = cut_od.subsample.mooring_array(
      2     Xmoor=lons_Kogur,
      3     Ymoor=lats_Kogur,
      4     varList=["Temp", "U", "V", "dyG", "dxG", "drF", "HFacS", "HFacW"],
      5 )

File ~/mambaforge/envs/Oceanography/lib/python3.9/site-packages/oceanspy/subsample.py:1379, in _subsampleMethods.mooring_array(self, **kwargs)
   1377 @_functools.wraps(mooring_array)
   1378 def mooring_array(self, **kwargs):
-> 1379     return mooring_array(self._od, **kwargs)

File ~/mambaforge/envs/Oceanography/lib/python3.9/site-packages/oceanspy/subsample.py:762, in mooring_array(od, Ymoor, Xmoor, **kwargs)
    757     this_Y, this_X, this_dists = _utils.great_circle_path(
    758         lat0, lon0, lat1, lon1, dist
    759     )
    761     # Cartesian coordinate of point in the middle
--> 762     x, y, z = _utils.spherical2cartesian(this_Y[1], this_X[1], R)
    763 else:
    764     # CARTESIAN: take the average
    765     x = (lon0 + lon1) / 2

IndexError: index 1 is out of bounds for axis 0 with size 0

where cut_od is the cutout dataset using the same lats and lons. The data lives entirely on face=2 so there is actually no transformation happening prior to the mooring calculation.

I looked into this and found where the error takes place. subsample.mooring computes and finds iteratively the locations of the mooring section, creating a continuous line passing through Center points. At each iteration, given two lats and lons on the grid, oceanspy generates an intermediate value with oceanspy.utils.great_circle_path (when using spherical coordinates) and proceeds to find its nearest neighbor. Then repeats this procedure until there is no gaps and thus the section is continuous.

The issue happens in the specific example below, in which oceanspy needs to fill the gaps inbetween the (lat0, lon0) shown as a blue dot, and (lat1, lon1) shown as red dot. These two were computed iteratively by oceanspy using the Kogur section. But the next iteration fails.

lat0, lon0 =  68.67726, -26.283813  # blue dot
lat1, lon1 = 68.66621, -26.249926  # red dot

example_fail

From the figure, there are clearly two additional intermediate grid points to be found between the two given coordinates points. Oceanspy finds intermediate points on a sphere with the following code:

from geopy.distance import great_circle as _great_circle

dist = _great_circle((lat0, lon0), (lat1, lon1), radius=R).km
dist = dist / 2.1
this_Y, this_X, this_dists = ospy.utils.great_circle_path(lat0, lon0, lat1, lon1, dist)

The issue is that ospy.utils.great_circle_path fails to find any intermediate value. This is, both this_Y and this_X are empty. Hence the error above.

The relevant code is in subsample.py lines 745-757.

Note that dist=1.8410331034092235 is arbitrarily divided by a factor of 2.1. In the source code subsample.py lines 756 there is a comment that states that Divide dist by 2.1 to make sure that returns 3 points.

I checked, and dxC and dyC are about half of the value of dist. I tried dividing dist by factors of 10 and larger, but that didn't solve the problem.

I haven't tested this example with ECCO. I am not sure if this bug can happen in ECCO and therefore in all LLC-grids, or if its a matter of resolution + grid.

If the issue is related to proximity between coordinate (end) points + complex geometry, then we can probably get the inbetween grid point by moving from (lat1, lon1) towards (lat0, lon0) one grid point at a time. Since we know their location in index space.

Something like

if len(this_Y)=0:
    "move one grid point in the direction of lat0, lon0"
malmans2 commented 1 year ago

While at NOC I've rewritten and optimised the code to find sections (I'm pretty sure I eventually wanted to implement it in OceanSpy).

If it's of use, it's available here: https://github.com/JMMP-Group/nordic-seas-validation/blob/main/nsv/section_finder.py

It has been used on NEMO data only, so you might have to change a few things to make it work with MITgcm output (e.g., naming convention for Arakawa grids).

malmans2 commented 1 year ago

Actually, I realised you don't have access to that repo. I'll share the code next week!

Mikejmnez commented 1 year ago

Thanks @malmans2 ! I have also been writing optimized code following your suggestion some time ago to look at xoak, and coincidentally the part that I needed to work on to improve it was this circle path option.

Looking forward to work on implementing your changes on oceanspy.

malmans2 commented 1 year ago

Here is the code:

from dataclasses import dataclass

import cf_xarray  # noqa: F401
import numpy as np
import xoak  # noqa: F401
from xarray import DataArray, Dataset

@dataclass
class SectionFinder:
    """
    Args:
        ds_domain (Dataset): domain_cfg dataset
    """

    ds_domain: Dataset

    def __post_init__(self):
        self._grids = {}

    @property
    def grids(self) -> dict:
        """Dictionary mapping each grid to a dataset with its coordinates"""

        if not self._grids:
            for grid in ("u", "v", "t", "f"):
                ds = Dataset(
                    coords={
                        "lon": self.ds_domain[f"glam{grid}"],
                        "lat": self.ds_domain[f"gphi{grid}"],
                    }
                )
                ds = ds.squeeze(drop=True)
                for key, value in ds.sizes.items():
                    ds[f"{key}_index"] = DataArray(range(value), dims=key)
                self._grids[grid] = ds.cf.guess_coord_axis()

        return self._grids

    def nearest_neighbor(self, lons, lats, grid: str) -> Dataset:
        """
        Given the coordinates defining a section, find the nearest points
        on a model grid.

        Args:
            lons (1D array-like): Longitudes defining a section
            lats (1D array-like): Latitudes defining a section
            grid (string): Model grid `{"u", "v", "t", "f"}`

        Returns:
            Dataset: Dataset with model coordinates and indexes
        """

        if not self.grids[grid].xoak.index:
            self._grids[grid].xoak.set_index(("lat", "lon"), "sklearn_geo_balltree")

        return self.grids[grid].xoak.sel(lat=DataArray(lats), lon=DataArray(lons))

    nearest_neighbour = nearest_neighbor

    def zigzag_section(self, lons, lats, grid: str) -> Dataset:
        """
        Given the coordinates defining a section, find the correspoinding zigzag section
        on a model grid.

        Args:
            lons (1D array-like): Longitudes defining a section
            lats (1D array-like): Latitudes defining a section
            grid (string): Model grid `{"u", "v", "t", "f"}`

        Returns:
            Dataset: Dataset with model coordinates and indexes
        """

        def diff_and_inds_where_insert(ix, iy):

            dx, dy = (np.diff(ii) for ii in (ix, iy))
            inds = np.argwhere(np.abs(dx) + np.abs(dy) > 1).squeeze()

            return dx, dy, inds

        # Extract indexes
        ds = self.nearest_neighbor(lons, lats, grid)
        ix, iy = (ds[f"{i}_index"].data for i in ("x", "y"))

        # Remove duplicates
        mask = np.abs(np.diff(ix)) + np.abs(np.diff(iy)) == 0
        ix, iy = (np.delete(ii, np.argwhere(mask)) for ii in (ix, iy))

        # Initialize variables
        dx, dy, inds = diff_and_inds_where_insert(ix, iy)

        # Stop when inds is empty
        while inds.size:

            # Extract diffs
            dx, dy = (di[inds] for di in (dx, dy))

            # Insert new values
            # Single diagonal step: move along y direction first
            mask = np.abs(dx * dy) == 1
            ix = np.insert(ix, inds + 1, ix[inds] + (dx / 2).astype(int))
            iy = np.insert(
                iy, inds + 1, iy[inds] + np.where(mask, dy, (dy / 2).astype(int))
            )

            # Prepare for next iteration
            dx, dy, inds = diff_and_inds_where_insert(ix, iy)

        # Find new lat/lon
        ds = self.grids[grid].isel(
            x=DataArray(ix, dims=ds.dims), y=DataArray(iy, dims=ds.dims)
        )

        return self.nearest_neighbor(ds["lon"], ds["lat"], grid)

    def velocity_points_along_zigzag_section(self, lons, lats) -> dict:
        """
        Given the coordinates defining a section, find the corrisponding velocity points
        along a zigzag section (f-grid). Useful to compute accurate volume fluxes.

        Args:
            lons (1D array-like): Longitudes defining a section
            lats (1D array-like): Latitudes defining a section

        Returns:
            dict: Dictionary mapping u/v grids to their coordinates and indexes.
        """

        # ZigZag path along f
        ds = self.zigzag_section(lons, lats, "f")

        # Find dimension name
        dim = list(ds.dims)[0]
        ds[dim] = ds[dim]

        # Compute diff and find max index of each pair
        ds_diff = ds.diff(dim)
        ds_roll = ds.rolling({dim: 2}).max().dropna(dim)

        # Fill dictionary
        ds_dict = {}
        for grid in ("u", "v"):

            # Apply mask
            # Meridional: u - Zonal: v
            ds = ds_roll.where(
                ds_diff[f"{'y' if grid == 'u' else 'x'}_index"], drop=True
            )
            da_dim = ds[dim]

            if not ds.sizes[dim]:
                # Empty: either zonal or meridional
                continue

            # Find new lat/lon
            ds = self.grids[grid].isel(
                x=DataArray(ds["x_index"].astype(int), dims=dim),
                y=DataArray(ds["y_index"].astype(int), dims=dim),
            )
            ds = self.nearest_neighbor(ds["lon"], ds["lat"], grid)

            # Assign coordinate - useful to concatenate u and v after extraction
            ds_dict[grid] = ds.assign_coords({dim: da_dim - 1})

        return ds_dict
malmans2 commented 1 year ago

Example usage:

ds_domain = xr.open_mfdataset("domain_cfg.nc")
ds_tgrid = xr.open_mfdataset("output_grid_T.nc")

finder = SectionFinder(ds_domain)
stations = finder.nearest_neighbor(
    lons=(lon0, lon1, lon2, ...),
    lats=(lat0, lat1, lat2, ...),
    grid="t"
)
section = ds_tgrid.isel({dim: stations[f"{dim}_index"] for dim in ("x", "y")})

A few comments:

  1. I haven't used it in more than one year, so I don't know whether it works correctly. Someone reported that it is not compatible with xarray>2022.03.0, I believe because of xarray's indexing refactor. It should be an easy fix though.
  2. It only uses NEMO's domain_cfg.nc to initialise the finder, which is NEMO's file that describes the Arakawa grids. Then you have a few functions to get the indexes of the section. zig_zag functions are similar to OceanSpy's concept of mooring.
  3. NEMO's conventions for Arakawa grids is: T, U, V, F (glam/gphi are lon/lat)
  4. Both NEMO and MITgcm use Arakawa-C, but some implementation details are slightly different. I don't remember the details but I think NEMO only uses 2 dimensions for all grids (x, y), so grids are just shifted. MITgcm uses 4 dimensions (x, y, xp1, yp1).
Mikejmnez commented 1 year ago

Great, I'll start working on this later in the week.

Mikejmnez commented 1 year ago

closed by #354