hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
100 stars 32 forks source link

consolidate XRange and YRange correct evaluation #266

Closed Mikejmnez closed 1 year ago

Mikejmnez commented 1 year ago

Description

The introduction of LLC geometry and idea of periodic domain (as opposed to full domain defined defined only in subpolar North Atlantic) has led to some issues regarding cutouts that, over time, we have patched here and there. These are documented in the closed issues #256 and #215. I will briefly described how we patched those issues and how these lead to inconsistencies.

Some basic definitions. The python arrays XRange and YRange have the following form:

XRange = [X_0, ..., X_f]
YRange = [Y_0, ..., Y_f]

where the subscripts 0 and f meant to represent the zeroth (e.g. first) station and the final (last) sampling station, respectively.

On issue #256. Renske showed a nice example of what the problem is there, when extracting mooring and survey stations from an array of lats and lons. This is, the issue at hand was in od.subsample.survey_stations and od.subsample.mooring_array and how the ranges where passed on to cutout.

Here I provide another one, describing also the patchwork. Consider WOCE section A03 extracted from the (official website). The ship track begins in Europe and ends in NorthAmerica. See plot below

Screen Shot 2022-10-18 at 10 02 19 AM

As visible implies that X_0 < X_f and furthermore Y_f > Y_0 > np.min(YRange). Thus simply extracting the pairs XRange[0], XRange[-1] and YRange[0], YRange[-1] and passing them into cutout produces no cutout.

The bug, as Resnke pointed out, indeed lies in the following snippet within get_maskH

if XRange is not None:
        # Use arrays
        XRange = _np.asarray([XRange[0] - add_Hbdr, XRange[-1] + add_Hbdr])
        XRange = XRange.astype(ds["XG"].dtype)

Note that for YRange, the max and min are taken instead. The original reason to not take min and max of XRange was associated with the discontinuity of longitude in the LLC geometry. However, as described above, simply taking the first and last can yield to errors. More on the discontinuity below.

The solution as described in #256, was to only pass the max and mins of a given array of lats and lons from survey_stations or mooring_arrays into .cutout. This is the actual code snippet (lines 693-699 in subsample.py) below:

    if "YRange" not in kwargs:
        kwargs["YRange"] = [_np.min(Ymoor), _np.max(Ymoor)]
    if "XRange" not in kwargs:
        kwargs["XRange"] = [_np.min(Xmoor), _np.max(Xmoor)]
    if "add_Hbdr" not in kwargs:
        kwargs["add_Hbdr"] = True
    od = od.subsample.cutout(**kwargs)

This solved the issue when the cutout is performed from either .survey_station or .mooring_array, but my guess is that it remains in cutout (i.e. pass an array directly into od.subsample.cutout).

So far the fix in #256 has worked but because no one is looking at cutouts (moorings, surveys) in the Pacific Ocean across the discontinuity in Longitude. Under such scenario simply taking the min and max of XRange would yield the wrong region of the world.

This discontinuity in Longitude due to the periodic domain in LLC grids was solved by Wenrui a while back (solved issue #215). He wrote a function called rel_lon. This worked really nicely but it has come to my attention while making regional cutout across the Pacific ocean rel_lon is only applied within subsample.cutout. This is, taking the min and max of XRange within survey_stations or mooring_array for an array across the discontinuity will yields either the wrong domain, or no domain.

Solution

I think the solution should be simple. Revert how mooring_array and survey_stations pass XRange and YRange into cutout, so pass the whole array. This is how things were done before. It should look like

    if "YRange" not in kwargs:
        kwargs["YRange"] = Ymoor
    if "XRange" not in kwargs:
        kwargs["XRange"] = Xmoor
    if "add_Hbdr" not in kwargs:
        kwargs["add_Hbdr"] = True
    od = od.subsample.cutout(**kwargs)

and also within get_maskH revert to

if XRange is not None:
        # Use arrays
        XRange = _np.asarray([_np.min(XRange) - add_Hbdr, _np.max(XRange[-1]) + add_Hbdr])
        XRange = XRange.astype(ds["XG"].dtype)

That way rel_lon`` only needs to and is evaluated withinsubsample.cutout```, and the edges of XRange and YRange are consistently defined throughout.

Mikejmnez commented 1 year ago

I just noticed that the following snippet of code within subsample.cutout (lines 351-358) needs to be fixed as well. It makes the assumption that the first element of XRange is west of all the rest. This is not the case when, for example, ship track began in US and ended, say in Japan or so.

    # Address the discontinuity in longitude
    ref_lon = 180
    if XRange is not None:
        if _rel_lon(XRange[0], ref_lon) > _rel_lon( XRange[-1], ref_lon):  # if the range crosses the discontinuity line.
            # Redefining longitude is necessary.
            ref_lon = XRange[0] - (XRange[0] - XRange[-1]) / 3

It seems straightforward to replace XRange[0], XRange[-1] with np.max(XRange) and np.min(XRange) respectively, except the last line. Working / testing it...