xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
113 stars 12 forks source link

[Bug]: regrid2 with open_mfdataset #615

Closed lee1043 closed 6 months ago

lee1043 commented 6 months ago

What happened?

My actual use case is opening multiple netcdf files using open_mfdataset and do regrid using regrid2, and there were about half of CMIP6 models that regrid2 was failing. I tracked down to where the issue was coming from.

I found that for those models, if I open a single netcdf file via open_dataset regrid2 works well, but if I open the same file via open_mfdataset regrid2 fails.

What did you expect to happen? Are there are possible answers you came across?

No response

Minimal Complete Verifiable Example (MVCE)

import os
import xcdat as xc

target_grid = xc.create_uniform_grid(-90, 89, 1.0, 0, 359, 1.0)

input_data = os.path.join(
    "/p/css03/esgf_publish/",
    "CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")

ds = xc.open_dataset(input_data)
ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")  # WORKS FINE

ds = xc.open_mfdataset(input_data)
ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")  # ERROR

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[54], line 1
----> 1 ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/accessor.py:324, in RegridderAccessor.horizontal(self, data_var, output_grid, tool, **options)
    322 input_grid = _get_input_grid(self._ds, data_var, ["X", "Y"])
    323 regridder = regrid_tool(input_grid, output_grid, **options)
--> 324 output_ds = regridder.horizontal(data_var, self._ds)
    326 return output_ds

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:81, in Regrid2Regridder.horizontal(self, data_var, ds)
     78     # Xarray defaults to masking with np.nan, CDAT masked with _FillValue or missing_value which defaults to 1e20
     79     input_data_var = input_data_var.where(src_mask != 0.0, masked_value)
---> 81 output_data = _regrid(
     82     input_data_var, src_lat_bnds, src_lon_bnds, dst_lat_bnds, dst_lon_bnds
     83 )
     85 output_ds = _build_dataset(
     86     ds,
     87     data_var,
   (...)
     92     self._output_grid,
     93 )
     95 return output_ds

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:105, in _regrid(input_data_var, src_lat_bnds, src_lon_bnds, dst_lat_bnds, dst_lon_bnds)
     98 def _regrid(
     99     input_data_var: xr.DataArray,
    100     src_lat_bnds: np.ndarray,
   (...)
    103     dst_lon_bnds: np.ndarray,
    104 ) -> np.ndarray:
--> 105     lat_mapping, lat_weights = _map_latitude(src_lat_bnds, dst_lat_bnds)
    106     lon_mapping, lon_weights = _map_longitude(src_lon_bnds, dst_lon_bnds)
    108     # convert to pure numpy

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:258, in _map_latitude(src, dst)
    252 bounds = [
    253     (np.minimum(dst_north[x], src_north[y]), np.maximum(dst_south[x], src_south[y]))
    254     for x, y in enumerate(mapping)
    255 ]
    257 # convert latitude to cell weight (difference of height above/below equator)
--> 258 weights = [
    259     (np.sin(np.deg2rad(x)) - np.sin(np.deg2rad(y))).reshape((-1, 1))
    260     for x, y in bounds
    261 ]
    263 return mapping, weights

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:259, in <listcomp>(.0)
    252 bounds = [
    253     (np.minimum(dst_north[x], src_north[y]), np.maximum(dst_south[x], src_south[y]))
    254     for x, y in enumerate(mapping)
    255 ]
    257 # convert latitude to cell weight (difference of height above/below equator)
    258 weights = [
--> 259     (np.sin(np.deg2rad(x)) - np.sin(np.deg2rad(y))).reshape((-1, 1))
    260     for x, y in bounds
    261 ]
    263 return mapping, weights

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, *shape)
   2217 if len(shape) == 1 and not isinstance(shape[0], Number):
   2218     shape = shape[0]
-> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit)

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/reshape.py:218, in reshape(x, shape, merge_chunks, limit)
    216     if len(shape) == 1 and x.ndim == 1:
    217         return x
--> 218     missing_size = sanitize_index(x.size / reduce(mul, known_sizes, 1))
    219     shape = tuple(missing_size if s == -1 else s for s in shape)
    221 if np.isnan(sum(x.shape)):

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/slicing.py:72, in sanitize_index(ind)
     66     return slice(
     67         _sanitize_index_element(ind.start),
     68         _sanitize_index_element(ind.stop),
     69         _sanitize_index_element(ind.step),
     70     )
     71 elif isinstance(ind, Number):
---> 72     return _sanitize_index_element(ind)
     73 elif is_dask_collection(ind):
     74     return ind

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/slicing.py:26, in _sanitize_index_element(ind)
     24 """Sanitize a one-element index."""
     25 if isinstance(ind, Number):
---> 26     ind2 = int(ind)
     27     if ind2 != ind:
     28         raise IndexError("Bad index.  Must be integer-like: %s" % ind)

ValueError: cannot convert float NaN to integer

Anything else we need to know?

I am using the latest xCDAT main branch version.

Environment

INSTALLED VERSIONS

commit: None python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.108.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.2.0 pandas: 2.2.0 numpy: 1.23.5 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: None dask: 2024.2.0 distributed: 2024.2.0 matplotlib: 3.7.1 cartopy: 0.22.0 seaborn: 0.12.2 numbagg: None fsspec: 2024.2.0 cupy: None pint: None sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 67.7.2 pip: 23.1.2 conda: None pytest: 7.3.1 mypy: None IPython: 8.11.0 sphinx: 5.3.0

tomvothecoder commented 6 months ago

@jasonb5 Maybe this can be addressed in #613?

jasonb5 commented 6 months ago

I'll followup this week and check with #613.

tomvothecoder commented 6 months ago

@jasonb5 Sounds good, thanks.

tomvothecoder commented 6 months ago

I reproduced this issue on the current latest version of xcdat (v0.6.1) and am trying to debug it.

tomvothecoder commented 6 months ago

This issue will be fixed by PR #613. Here is the PR review comment with more info on the root cause and solution.

lee1043 commented 6 months ago

@tomvothecoder @jasonb5 I confirmed that the above minimal code works well with #613.