Modify grid information to include xgcm compatible metadata to identify grid positions

jbusecke commented 2 years ago

There is currently a very brittle attempt in cmip6_preprocessing to store information about the grid position/staggering in a yaml file. This was created by me in an ad hoc way that seems to have produced many wrong configurations (e.g. https://github.com/jbusecke/cmip6_preprocessing/issues/240).

Since this information really is an intrinsic part of the native grid, I would ultimately like to migrate as much as possible to this repo and remove it from cmip6_pp.

After we have settled on a few datasets here and figured out how to implement renaming, I suggest we try to add xgcm compatible metadata to the resulting 'grid' datasets, so that one can simply do

from xgcm import Grid
grid_ds = some_magic_to_get_the_grid(source_id)
grid = Grid(grid_ds)

And this would automatically recognize axis positions and dimension names.

I will have to think about how to actually implement this. The big question here is if we should rename/modify the files before they are uploaded to the cloud or provide logic to do that afterwards (similar to cmip6_pp). I think currently I prefer the latter, but we will have to see if this is feasible with all grid files.

cc @emaroon

jbusecke commented 2 years ago

Since the currently used COMODO specs used by xgcm are outdated? at this point, we should probably try to parse the information using the SGRID spec and implement the capability to parse that in xgcm at the same time (https://github.com/xgcm/xgcm/issues/109).

jdldeauna commented 1 year ago

Hey @jbusecke ,

I worked on a very prelim version of a method which can take static ocean grid data and apply it to a CMIP6 model grid to make it compatible with xgcm. Just to double-check, would it be part of xmip or xgcm? I think either way would work.

Here is a more detailed notebook, but the basic function is as follows:

def preprocess_static_grid_model(ds):
    """
    This function renames variables in static ocean grid dataset to match xmip conventions

    ds : xarray Dataset
        Static ocean grid downloaded from Pangeo Forge
    """

    ds = ds.rename_vars({'parea':'area_t', 'uarea':'area_u', 'varea':'area_v',
                          'pdx':'dx_t', 'udx':'dx_u', 'vdx':'dx_v',
                          'pdy':'dy_t', 'udy':'dy_u', 'vdy':'dy_v',
                          'pdepth':'dz_t', 'udepth':'dz_u', 'vdepth':'dz_v'
                          })

    # area variables
    area_t = ds['area_t']
    area_u = ds['area_u'].rename({'x':'x_r'})
    area_v = ds['area_v'].rename({'y':'y_r'})

    dx_t = ds['dx_t']
    dx_u = ds['dx_u'].rename({'x':'x_r'})
    dx_v = ds['dx_v'].rename({'y':'y_r'})

    dy_t = ds['dy_t']
    dy_u = ds['dy_u'].rename({'x':'x_r'})
    dy_v = ds['dy_v'].rename({'y':'y_r'})

    dz_t = ds['dz_t']
    dz_u = ds['dz_u'].rename({'x':'x_r'})
    dz_v = ds['dy_v'].rename({'y':'y_r'})

    coords = {'area_t': area_t, 'area_u': area_u, 'area_v': area_v, 
              'dx_t': dx_t, 'dx_u': dx_u, 'dx_v': dx_v, 
              'dy_t': dy_t, 'dy_u': dy_u, 'dy_v': dy_v,
              'dz_t': dz_t, 'dz_u': dz_u, 'dz_v': dy_v,
             }
    metrics={ ('X','Y'):['area_t','area_u','area_v'], 
             ('X'):['dx_t','dx_u','dx_v'], 
             ('Y'):['dy_t','dy_u','dy_v'], 
             ('Z'):['dz_t','dz_u','dz_v']
            }

    return coords, metrics

There are two issues that I found:

The MPI static ocean grid datasets doesn't match their corresponding CMIP6 model dimensions
Depth of grid cells as specified in the static ocean grid are 2-dimensional instead of 3D, and doesn't specify the height of each vertical layer. This might be esp important for native grid datasets where vertical levels might change per time step.

pangeo-forge / CMIP6_static_grids-feedstock

Modify grid information to include xgcm compatible metadata to identify grid positions #8