pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
195 stars 35 forks source link

"TypeError: buffer is too small for requested array" when dealing with very high resolution datasets #380

Open JSAnandEOS opened 3 months ago

JSAnandEOS commented 3 months ago

I'm trying to regrid rectilinear data from Sentinel-3 (~300 m resolution) to the EURO-CORDEX grid (~12 km resolution) using conservative regridding. Because CORDEX is curvilinear, I don't know how to define the grid in ESMF as I can't find any examples online. When I try using xESMF I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[8], line 59
     56 out_grid = {'lat': eur11_lat, 'lon': eur11_lon, 'lat_b': eur11_lat_bounds, 'lon_b': eur11_lon_bounds}
     57 inp_grid = {'lat': lai_lat, 'lon': lai_lon, 'lat_b': lai_lat_bounds, 'lon_b': lai_lon_bounds}
---> 59 regridder = xe.Regridder(inp_grid, out_grid, 'conservative')

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/xesmf/frontend.py:919, in Regridder.__init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, parallel, **kwargs)
    917     grid_in, shape_in, input_dims = ds_to_ESMFlocstream(ds_in)
    918 else:
--> 919     grid_in, shape_in, input_dims = ds_to_ESMFgrid(
    920         ds_in, need_bounds=need_bounds, periodic=periodic
    921     )
    922 if locstream_out:
    923     grid_out, shape_out, output_dims = ds_to_ESMFlocstream(ds_out)

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/xesmf/frontend.py:164, in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
    162     grid = Grid.from_xarray(lon.T, lat.T, periodic=periodic, mask=mask.T)
    163 else:
--> 164     grid = Grid.from_xarray(lon.T, lat.T, periodic=periodic, mask=None)
    166 if need_bounds:
    167     lon_b, lat_b = _get_lon_lat_bounds(ds)

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/xesmf/backend.py:114, in Grid.from_xarray(cls, lon, lat, periodic, mask)
    108     num_peri_dims = None
    110 # ESMPy documentation claims that if staggerloc and coord_sys are None,
    111 # they will be set to default values (CENTER and SPH_DEG).
    112 # However, they actually need to be set explicitly,
    113 # otherwise grid._coord_sys and grid._staggerloc will still be None.
--> 114 grid = cls(
    115     np.array(lon.shape),
    116     staggerloc=staggerloc,
    117     coord_sys=ESMF.CoordSys.SPH_DEG,
    118     num_peri_dims=num_peri_dims,
    119 )
    121 # The grid object points to the underlying Fortran arrays in ESMF.
    122 # To modify lat/lon coordinates, need to get pointers to them
    123 lon_pointer = grid.get_coords(coord_dim=0, staggerloc=staggerloc)

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/esmpy/util/decorators.py:59, in initialize.<locals>.new_func(*args, **kwargs)
     56 from esmpy.api import esmpymanager
     58 esmp = esmpymanager.Manager(debug = False)
---> 59 return func(*args, **kwargs)

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/esmpy/api/grid.py:479, in Grid.__init__(self, max_index, num_peri_dims, periodic_dim, pole_dim, coord_sys, coord_typekind, staggerloc, pole_kind, filename, filetype, reg_decomp, decompflag, is_sphere, add_corner_stagger, add_user_area, add_mask, varname, coord_names, tilesize, regDecompPTile, name)
    454 # if self.decount == 1:
    455 # elif self.decount > 1:
    456 #     # lower_bounds[de][staggerLoc]
   (...)
    476 
    477 # Add coordinates if a staggerloc is specified
    478 if not isinstance(staggerloc, type(None)):
--> 479     self.add_coords(staggerloc=staggerloc, from_file=from_file)
    481 # Add items if they are specified, this is done after the
    482 # mask and area are initialized
    483 if add_user_area:

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/esmpy/api/grid.py:837, in Grid.add_coords(self, staggerloc, coord_dim, from_file)
    834     ESMP_GridAddCoord(self, staggerloc=stagger)
    836 # and now for Python
--> 837 self._allocate_coords_(stagger, from_file=from_file)
    839 # set the staggerlocs to be done
    840 self.staggerloc[stagger] = True

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/esmpy/api/grid.py:1062, in Grid._allocate_coords_(self, stagger, localde, from_file)
   1060 if (self.ndims == self.rank) or (self.ndims == 0):
   1061     for xyz in range(self.rank):
-> 1062         self._link_coord_buffer_(xyz, stagger, localde)
   1063 # and this way if we have 1d coordinates
   1064 elif self.ndims < self.rank:

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/esmpy/api/grid.py:1112, in Grid._link_coord_buffer_(self, coord_dim, stagger, localde)
   1109 data = ESMP_GridGetCoordPtr(self, coord_dim, staggerloc=stagger, localde=localde)
   1110 lb, ub = ESMP_GridGetCoordBounds(self, staggerloc=stagger, localde=localde)
-> 1112 gridCoordP = ndarray_from_esmf(data, self.type, ub-lb)
   1114 # alias the coordinates to a grid property
   1115 self._coords[stagger][coord_dim] = gridCoordP

File ~/miniconda3/envs/xesmf_test/lib/python3.12/site-packages/esmpy/util/esmpyarray.py:38, in ndarray_from_esmf(data, dtype, shape)
     33 else:
     34     buffer = np.core.multiarray.int_asbuffer(
     35         ct.addressof(data.contents), size)
---> 38 esmfarray = np.ndarray(tuple(shape[:]), constants._ESMF2PythonType[dtype],
     39                        buffer, order="F")
     41 return esmfarray

TypeError: buffer is too small for requested array

My problem is that The rectilinear dataset is too large: 17875 latitude x 39212 longitude points. It seems like any attempt to make a big array in ESMPy fails in the same way e.g.

grid = ESMF.Grid(np.array([20000, 15000]), 
                 staggerloc=ESMF.StaggerLoc.CENTER, 
                 coord_sys=ESMF.CoordSys.SPH_DEG)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 grid = esmpy.Grid(np.array([20000, 15000]), 
      2                  staggerloc=esmpy.StaggerLoc.CENTER, 
      3                  coord_sys=esmpy.CoordSys.SPH_DEG)

File ~/miniconda3/envs/optforeu_cdo/lib/python3.12/site-packages/esmpy/util/decorators.py:59, in initialize.<locals>.new_func(*args, **kwargs)
     56 from esmpy.api import esmpymanager
     58 esmp = esmpymanager.Manager(debug = False)
---> 59 return func(*args, **kwargs)

File ~/miniconda3/envs/optforeu_cdo/lib/python3.12/site-packages/esmpy/api/grid.py:479, in Grid.__init__(self, max_index, num_peri_dims, periodic_dim, pole_dim, coord_sys, coord_typekind, staggerloc, pole_kind, filename, filetype, reg_decomp, decompflag, is_sphere, add_corner_stagger, add_user_area, add_mask, varname, coord_names, tilesize, regDecompPTile, name)
    454 # if self.decount == 1:
    455 # elif self.decount > 1:
    456 #     # lower_bounds[de][staggerLoc]
   (...)
    476 
    477 # Add coordinates if a staggerloc is specified
    478 if not isinstance(staggerloc, type(None)):
--> 479     self.add_coords(staggerloc=staggerloc, from_file=from_file)
    481 # Add items if they are specified, this is done after the
    482 # mask and area are initialized
    483 if add_user_area:

File ~/miniconda3/envs/optforeu_cdo/lib/python3.12/site-packages/esmpy/api/grid.py:837, in Grid.add_coords(self, staggerloc, coord_dim, from_file)
    834     ESMP_GridAddCoord(self, staggerloc=stagger)
    836 # and now for Python
--> 837 self._allocate_coords_(stagger, from_file=from_file)
    839 # set the staggerlocs to be done
    840 self.staggerloc[stagger] = True

File ~/miniconda3/envs/optforeu_cdo/lib/python3.12/site-packages/esmpy/api/grid.py:1062, in Grid._allocate_coords_(self, stagger, localde, from_file)
   1060 if (self.ndims == self.rank) or (self.ndims == 0):
   1061     for xyz in range(self.rank):
-> 1062         self._link_coord_buffer_(xyz, stagger, localde)
   1063 # and this way if we have 1d coordinates
   1064 elif self.ndims < self.rank:

File ~/miniconda3/envs/optforeu_cdo/lib/python3.12/site-packages/esmpy/api/grid.py:1112, in Grid._link_coord_buffer_(self, coord_dim, stagger, localde)
   1109 data = ESMP_GridGetCoordPtr(self, coord_dim, staggerloc=stagger, localde=localde)
   1110 lb, ub = ESMP_GridGetCoordBounds(self, staggerloc=stagger, localde=localde)
-> 1112 gridCoordP = ndarray_from_esmf(data, self.type, ub-lb)
   1114 # alias the coordinates to a grid property
   1115 self._coords[stagger][coord_dim] = gridCoordP

File ~/miniconda3/envs/optforeu_cdo/lib/python3.12/site-packages/esmpy/util/esmpyarray.py:38, in ndarray_from_esmf(data, dtype, shape)
     33 else:
     34     buffer = np.core.multiarray.int_asbuffer(
     35         ct.addressof(data.contents), size)
---> 38 esmfarray = np.ndarray(tuple(shape[:]), constants._ESMF2PythonType[dtype],
     39                        buffer, order="F")
     41 return esmfarray

TypeError: buffer is too small for requested array

What can I do here? I can't reduce my input dataset any further, as it already covers roughly the same region as the CORDEX target grid. Is it possible to run this Regridder somehow in MPI/etc.? Failing that, how can one create a curvilinear grid netCDF file that ESMF can recognise?

aulemahal commented 3 months ago

Sadly, I don't think there's an easy solution to your problem. I suggest you open an issue at the ESMF repo as the error seems to come from ESMpy and not xESMF.

As for the question about creating a curvilinear grid. The best would be to get the grid from an existing file. Are you planning to include some CORDEX output in your analysis, you could use thsat grid ? Otherwise, the  py-cordex package should be able to help.

JSAnandEOS commented 3 months ago

Thank you. I left a post on their discussion forum, so hopefully I get a response from them soon.

I tried producing a netCDF file in py-cordex including the grid cell boundaries, but ESMF failed to read it in, claiming that information about the bounds was missing. If it's not too much trouble, would you mind testing for yourself whether a file produced from py-cordex is compatible with ESMF_Regrid and get back to me?

billsacks commented 3 months ago

For visibility here, I want to note that we have a fix for this issue that we plan to get into the upcoming ESMF 8.7 release: See https://github.com/orgs/esmf-org/discussions/263#discussioncomment-10131509 and https://github.com/esmf-org/esmf/pull/267. The issue was that we were casting a size to a 32-bit integer when we should have been using a 64-bit integer on 64-bit architectures; this change will support the creation of large Arrays.

Thanks, @JSAnandEOS for reporting this!

maxrjones commented 1 month ago

@billsacks thanks for implementing a fix for this issue! Is there an estimated timeline for the ESMF 8.7 release?

billsacks commented 1 month ago

Is there an estimated timeline for the ESMF 8.7 release?

It should be out soon - within the next couple of weeks. We're just doing final testing and documentation of the release candidate.