Unresolved issues from previous llcrearrange

Mikejmnez commented 1 year ago

Description

After optimizing llc_rearrange, a couple of issues/bugs were left unresolved that I didn't notice until just before merging the huge PR #272 .

The issues are:

When making a cutout in which XRangeand YRange define a region solely within any or all of the rotated facets (e.g. P01 WOCE section) the transformed dataset has reversed indexing order along dimension Y, even though the plots showed data with the right orientation. Turns out, this is because somehow tick labels have the correct ordering despite reversed indexing. See example figure below regarding P10 data:

The figure above is the correct temperature map in X and Y (label) space, which is why I didn't pick up the issue right away. In the following cell in the snapshot, you can see that the indexing of the data is decreasing. As a result, ds['Theta'].isel(y=0) would return the northern-most temperature, but ds['Theta'].sel(y=40) would be correct.

NOTE: This behavior does not happen if XRange and YRange define a region across facets, or data within facets with the correct data layout. For example, a cutout from the subpolar North Atlantic has the correct layout (even when involving data from multiple faces with different orientation).

Inconsistent od.grid_coord when updating the grid for cutout output after a transformation. This happened after PR #261 . The way I originally intended for cutout and llc_rearrange to work together, was to update the od.grid_coords from the intake catalog after performing a transformation. Before PR #261 the grid_coords in the intake-catalog were consistent with the final shape of the transformed dataset. After PR #261, grid_coords in the intake-catalog are consistent with the original (untransformed) data instead. You can see the original code in subsample.py lines 359-370.

What I Did

It was a bit involved, but I figured how to transpose and reverse the y-dimensions with minimal changes. This will slow down the cutout for LLC4320 and perhaps a bit for ECCO but not by much.
This is an easy fix. Need to define a new grid_coord within subsample.cutout, after transforming the dataset, instead of using the grid_coords from the intake catalog. This is,

grid_coords = {
            'Y': {'Y': None, 'Yp1': 0.5},
            'X': {'X': None, 'Xp1': 0.5}, 
            'Z': {'Z': None, 'Zp1': 0.5, 'Zu': 0.5, 'Zl': -0.5},
            'time': {'time': -0.5}}

Expect a PR in the next hour or less tests are passing :)

Mikejmnez commented 1 year ago

Another issue to resolve

To have the same variable names in ECCO and LLC4320. For example, (potential) temperature is named THETA in ECCO and Temp in LLC4320.

The fix is to set ECCO variable names as the standard.

asiddi24 commented 1 year ago

Great catch ! Also, I didn't realize variable names were different in ECCO and LLC4320. Also, I notice that you re-define grid as

grid_coords = {
            'Y': {'Y': None, 'Yp1': 0.5},
            'X': {'X': None, 'Xp1': 0.5}, 
            'Z': {'Z': None, 'Zp1': 0.5, 'Zu': 0.5, 'Zl': -0.5},
            'time': {'time': -0.5}}

which is different from how the intake catalog defines it, which is

grid_coords = {
            'Y': {'Y': None, 'Yp1': -0.5},
            'X': {'X': None, 'Xp1': -0.5}, 
            'Z': {'Z': None, 'Zp1': 0.5, 'Zu': 0.5, 'Zl': -0.5},
            'time': {'time': -0.5}}

Is this what causes a problem in the cutout ?

Mikejmnez commented 1 year ago

The indexing of the data doesn't have to do with grid_coord but rather the layout of the data itself. For data defined on a rotated facet, the dimension Y (and Yp1) increases towards the geographic North, which means the Y=0 index is the northernmost point, geographically speaking and within the facet.

In the previous PR, I thought I had figured out a way to un-rotate the data without having to manipulate it - by manipulating the indexes. I was so sure I was right by looking at plotted data, but I was mistaken.

This is, in the previous PR I was trying to avoid doing (for example for 'XG') something like

da = od._ds['XG'].isel(face=2)[::-1, :]

The above line is only part the solution. It is not costly for small datasets but can be for large ones.

What I did

Anyways, in PR #282, I implemented a function that corrects the data layout and is not too costly even in the LLC4320. :)

Mikejmnez commented 1 year ago

Also, the way I am now defining grid_coord in subsample.cutout after the data is transformed, should be consistent with the transformed dataset itself.

The transformed dataset, which is the output of arctic_crown(**args), has the property

DS = LLC.arctic_crown(**arg)
len(DS['X']) = len(DS['Xp1']) - 1

This is, by construction, data variables with dimensions Xp1, Yp1 are outer points.

Mikejmnez commented 1 year ago

For example. When performing a cutout with XRange and YRange defined by the P14 (WOCE) coordinates (lats and lons are defined in test_llc_rearrange.py), with PR #282 you now get:

and

and can easily interpolate and compute curl consistently

zeta_P14_LLC4320

Mikejmnez commented 1 year ago

closing - Resolved by #284

hainegroup / oceanspy