hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
101 stars 32 forks source link

stations + mooring expected behavior: similarities and discrepancies #398

Closed Mikejmnez closed 11 months ago

Mikejmnez commented 11 months ago

some expected behaviors

The objective on this long issue is to document some expected behavior below, and reference them in the documentation (read the docs).

od.subsample.stations allows for extraction of isolated data using nearest-neighbor look up. This data may be geographically isolated or, contiguous.

# Default behavior:
lons = [-150, 20, 30]
lats = [-10,-1, 5]
args = {'Xcoords': lons, 'Ycoords': lats}
od_stns = od.subsample.stations(**args)

The result is an OceanDataset with no complex topology, and with station as a new dimension of length 3.

Importantly, od.subsample.stations can be called from within od.subsample.mooring_array. This means that data that is meant for od.subsample.mooring _array can be accessed from od.subsample.stations. For example:

mooring_array

Non-default behavior of stations:

ds, diffX, diffY = od.subsample.stations(**args, dim_name='mooring')

notice the extra argument, and the different returns. ds is an xarray.dataset with the sampled/extracted data along a new dimension mooring, and diffX and diffX are numpy arrays that provide information about the path of the orientation of the path/circuit the mooring_array follows.

Instead, the correct extraction of mooring_array that makes use of the code within od.subsample.stations

lons = [-150, 20, 30]
lats = [-10, -1, 5]
args = {'Xmoor': lons, 'Ymoor': lats}
od_moor = od.subsample.mooring_array(**args, serial=True)

The default behavior for mooring_array is serial=False.

Calling stations from within mooring_array+serial=True helps to: 1) Avoid repeated code. 2) Keep legacy code intact. The original implementation of mooring_array is used by default. In this original implementation od.subsample.cutout gets called to reduce the size of the dataset before building the tree. 3) With serial=True, mooring_array and cutout are decoupled. This should be the desired approach when the dataset has face as a dimension. cutout works but can be computationally expensive.

Expected ValueError behavior

The following

lons = [-150, 20, 30]
lats = [-10,-1, 5]
args = {'Xcoords': lons, 'Ycoords': lats}
ds, diffX, diffY = od.subsample.stations(**args, dim_name='mooring')

produces the following shortened Traceback ValueError:

ValueError: faces 8 and 1 are not contiguous.

There is nothing wrong, it is simply being used in a way that was not intended.

The reason behind the error is because one of the coordinates pairs live in face=8 and the others in face=1, with no data in between. These two faces are NOT contiguous. od.subsample.mooring_array makes sure that does not happen, by creating an array that connects the given coordinate values via great circles (when in spherical domain), or simple straight lines in cartesian coords.

There is no error when subsampling via mooring_array and serial=True

You can visualize the output from stations and mooring array.

Screen Shot 2023-11-02 at 11 38 32 AM

NOTE

I just noticed that if you only take the endpoint lons/lats above to extract a mooring_array, you still get the same error even when using od.subsample.mooring_array as intended. This has to do with the singularity at the southern pole in the cube-sphere (great circle paths assume an spherical Earth, and can generate an array that is not connected in cube-sphere geometry near the southern pole).

And so, when deailng with well -separated coordinates it is good practice to pre-generate as many intermediate points as possible, and not to rely to much on great circle paths when using LLC data.

ThomasHaine commented 11 months ago

Good work @Mikejmnez !