xoceanmodel / xroms

Work with ROMS ocean model output with xarray
https://xroms.readthedocs.io/
MIT License
60 stars 37 forks source link

helper funciton to chunk datasets #16

Open dcherian opened 3 years ago

dcherian commented 3 years ago

I use this little snippet to construct appropriate chunking dictionary for open_mfdataset

chunk = {"xi": 500, "eta": 120, "ocean_time": 20} 

chunks = {}
for sub in ["rho", "u", "v", "psi"]:
    for k, v in chunk.items():
        chunks[f"{k}_{sub}"] = v
chunks["ocean_time"] = chunk["ocean_time"]
chunks
{'xi_rho': 500,
 'eta_rho': 120,
 'xi_u': 500,
 'eta_u': 120,
 'xi_v': 500,
 'eta_v': 120,
 'xi_psi': 500,
 'eta_psi': 120,
 'ocean_time': 20}

I was thinking it would be nice to have a top-level xroms function

def construct_chunks(xi=None, eta=None, s=None, ocean_time=None):
    pass

chunks= xroms.construct_chunks(xi=500, eta=120, ocean_time=20)
ds = xr.open_dataset(..., chunks=chunks)
kthyng commented 1 year ago

@dcherian this looks great — I didn't have notifications on but now I do and am going to make bits of time to get xroms back up to speed. I'll get to this issue soon hopefully!

dylanschlichting commented 1 month ago

Bumping this because I ran into a problem with chunks but fixed it partly with the code discussed above. Got xroms running on a NERSC machine with TXLA ROMS model output yesterday. If I do not manually prescribe single chunks for the spatial dimensions, xgcm fails for simple operations (e.g. derivatives). Example below with quick fix. Environment is attached.

import xarray as xr
import xroms 
import xgcm
from xgcm import Grid
import warnings
warnings.filterwarnings("ignore") #turns off annoying warnings

path = '/pscratch/sd/d/dylan617/txla_roms/runs/forecast_test/TXLA2.ocn.his.2024_07_15_f.nc'
ds = xroms.open_netcdf(path)
ds,grid = xroms.roms_dataset(ds, include_cell_volume=True)
ds.xroms.set_grid(grid)
grid.derivative(ds.u.isel(s_rho = -1), 'X', boundary = 'extend')

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[7], line 1
----> 1 grid.derivative(ds.u.isel(s_rho = -1), 'X', boundary = 'extend')

File /global/u2/d/dylan617/miconda3/envs/xroms/lib/python3.11/site-packages/xgcm/grid.py:2504, in Grid.derivative(self, da, axis, **kwargs)
   2462 def derivative(self, da, axis, **kwargs):
   2463     """
   2464     Take the centered-difference derivative along specified axis.
   2465 
   (...)
   2502         The differentiated data
   2503     """
-> 2504     diff = self.diff(da, axis, **kwargs)
   2505     dx = self.get_metric(diff, (axis,))
   2506     return diff / dx

File /global/u2/d/dylan617/miconda3/envs/xroms/lib/python3.11/site-packages/xgcm/grid.py:2095, in Grid.diff(self, da, axis, **kwargs)
   2045 def diff(self, da, axis, **kwargs):
   2046     """
   2047     Difference neighboring points to the intermediate grid point.
   2048 
   (...)
   2093     >>> grid.diff(da, ["X", "Y"], fill_value={"X": 0, "Y": 100})
   2094     """
-> 2095     return self._1d_grid_ufunc_dispatch("diff", da, axis, **kwargs)

File /global/u2/d/dylan617/miconda3/envs/xroms/lib/python3.11/site-packages/xgcm/grid.py:1838, in Grid._1d_grid_ufunc_dispatch(self, funcname, data, axis, to, keep_coords, metric_weighted, other_component, **kwargs)
   1835 else:
   1836     map_overlap = False
-> 1838 array = grid_ufunc(
   1839     self,
   1840     array,
   1841     axis=[(ax_name,)],
   1842     keep_coords=keep_coords,
   1843     dask=dask,
   1844     map_overlap=map_overlap,
   1845     other_component=other_component,
   1846     **remaining_kwargs,
   1847 )
   1849 if ax_metric_weighted:
   1850     metric = self.get_metric(array, ax_metric_weighted)

File /global/u2/d/dylan617/miconda3/envs/xroms/lib/python3.11/site-packages/xgcm/grid_ufunc.py:460, in GridUFunc.__call__(self, grid, axis, *args, **kwargs)
    458 map_overlap = kwargs.pop("map_overlap", self.map_overlap)
    459 pad_before_func = kwargs.pop("pad_before_func", self.pad_before_func)
--> 460 return apply_as_grid_ufunc(
    461     self.ufunc,
    462     *args,
    463     axis=axis,
    464     grid=grid,
    465     signature=self.signature,
    466     boundary_width=self.boundary_width,
    467     boundary=boundary,
    468     dask=dask,
    469     map_overlap=map_overlap,
    470     pad_before_func=pad_before_func,
    471     **kwargs,
    472 )

File /global/u2/d/dylan617/miconda3/envs/xroms/lib/python3.11/site-packages/xgcm/grid_ufunc.py:747, in apply_as_grid_ufunc(func, axis, grid, signature, boundary_width, boundary, fill_value, keep_coords, dask, map_overlap, pad_before_func, other_component, *args, **kwargs)
    744 # Maybe map function over chunked core dims using dask.array.map_overlap
    745 if map_overlap:
    746     # Disallow situations where shifting axis position would cause chunk size to change
--> 747     _check_if_length_would_change(sig)
    749     mapped_func = _map_func_over_core_dims(
    750         func,
    751         args,
   (...)
    755         out_dtypes,
    756     )
    757 else:

File /global/u2/d/dylan617/miconda3/envs/xroms/lib/python3.11/site-packages/xgcm/grid_ufunc.py:1004, in _check_if_length_would_change(signature)
    998 all_ax_positions = set(
    999     p
   1000     for arg_ps in signature.in_ax_positions + signature.out_ax_positions
   1001     for p in arg_ps
   1002 )
   1003 if any(pos in DISALLOWED_OVERLAP_POSITIONS for pos in all_ax_positions):
-> 1004     raise NotImplementedError(
   1005         "Cannot chunk along a core dimension for a grid ufunc which has a signature which "
   1006         f"includes one of the axis positions {DISALLOWED_OVERLAP_POSITIONS}."
   1007         "Consider rechunking to a single chunk along this dimension if possible."
   1008     )

NotImplementedError: Cannot chunk along a core dimension for a grid ufunc which has a signature which includes one of the axis positions ['inner', 'outer'].Consider rechunking to a single chunk along this dimension if possible.

To fix this, I did the following before opening the dataset, which then works as expected.

chunk = {"xi": -1, "eta": -1, "ocean_time": 1} 

chunks = {}
for sub in ["rho", "u", "v", "psi"]:
    for k, v in chunk.items():
        chunks[f"{k}_{sub}"] = v
chunks["ocean_time"] = chunk["ocean_time"]

chunks
{'xi_rho': -1,
 'eta_rho': -1,
 'ocean_time_rho': 1,
 'xi_u': -1,
 'eta_u': -1,
 'ocean_time_u': 1,
 'xi_v': -1,
 'eta_v': -1,
 'ocean_time_v': 1,
 'xi_psi': -1,
 'eta_psi': -1,
 'ocean_time_psi': 1,
 'ocean_time': 1}

Then reopen, which works as expected

ds = xroms.open_netcdf(path, chunks = chunks)
ds,grid = xroms.roms_dataset(ds, include_cell_volume=True)
ds.xroms.set_grid(grid)
grid.derivative(ds.u.isel(s_rho = -1), 'X', boundary = 'extend')

xarray.DataArray

    ocean_time: 25, eta_rho: 191, xi_rho: 671

    dask.array<chunksize=(1, 191, 671), meta=np.ndarray>
    Coordinates:
        xi_rho
        (xi_rho)
        int64
        0 1 2 3 4 5 ... 666 667 668 669 670
        eta_rho
        (eta_rho)
        int64
        0 1 2 3 4 5 ... 186 187 188 189 190
        ocean_time
        (ocean_time)
        datetime64[ns]
        2024-07-15 ... 2024-07-16
    Indexes: (3)
    Attributes:

    standard_name :
        nulvar
    long_name :
        u-momentum component
    units :
        meter second-1
    time :
        ocean_time
    cell_methods :
        ocean_time: point
    grid :
        grid
    location :
        edge1
    field :
        u-velocity, scalar, series

Does it make sense to implement something like above into xroms to prevent the error? req.txt

kthyng commented 4 weeks ago

This function (https://github.com/xoceanmodel/xroms/blob/main/xroms/utilities.py#L21) is a kludge to workaround a similar problem in xgcm (though I don't know why I named it that which implies it might also have another job too? just replying quickly at the moment) — though that is a different error message actually, if I recall correctly. Both error messages required rechunking to a single chunk though. Any chance this is fixed on xgcm's side at this point?