pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
182 stars 32 forks source link

Can't use ESMF generated weights netcdf in xesmf due to filesize #350

Closed jvmcgovern closed 3 months ago

jvmcgovern commented 3 months ago

Hi there,

Following on from my previous post, I compiled ESMF on HPC and generated the conservative remapping weights netcdf. However, when I went to use the netcdf in xesmf to construct the regridder, it failed, I imagine due to the size of the netcdf (20GB).

Is this something to add to future developments or am I missing a setting somewhere?

Thanks for reading, Joe

aulemahal commented 3 months ago

Hi! Could you share the output of ncdump -sh on your netCDF file ? And maybe the traceback of the error when you try to open it ?

xESMF has good dask support for the regridding process, but it still needs the weights to be loaded in memory. Weights are supposed to be stored in a sparse array, which compresses them a lot, but if even that is too large for your RAM then xESMF won't be able to help. If this is the problem, we could indeed try to implement something that only loads the weights on demand, but I don't think that will be easy.

aulemahal commented 3 months ago

Another solution that you can try right now is to divide your destination grid into smaller blocks, regrid the source grid on each of those blocks and stitch them afterwards. If the mapping from source to destination is very straightforward you could even subset the source grid accordingly for each block, in order to reduce the size of the problem. That's not a cool solution, but it may allow you to go forward in your project without waiting for the dev team to come up with a better solution...

jvmcgovern commented 3 months ago

Thanks for your response. Here is the output from ncdump-sh of the weights file

xe_input2output_grid_weights { dimensions: n_a = 190057960 ; n_b = 1361100 ; n_s = 113839666 ; nv_a = 4 ; nv_b = 4 ; num_wgts = 1 ; src_grid_rank = 2 ; dst_grid_rank = 2 ; variables: int src_grid_dims(src_grid_rank) ; src_grid_dims:_Storage = "contiguous" ; src_grid_dims:_Endianness = "little" ; int dst_grid_dims(dst_grid_rank) ; dst_grid_dims:_Storage = "contiguous" ; dst_grid_dims:_Endianness = "little" ; double yc_a(n_a) ; yc_a:units = "degrees" ; yc_a:_Storage = "contiguous" ; yc_a:_Endianness = "little" ; double yc_b(n_b) ; yc_b:units = "degrees" ; yc_b:_Storage = "contiguous" ; yc_b:_Endianness = "little" ; double xc_a(n_a) ; xc_a:units = "degrees" ; xc_a:_Storage = "contiguous" ; xc_a:_Endianness = "little" ; double xc_b(n_b) ; xc_b:units = "degrees" ; xc_b:_Storage = "contiguous" ; xc_b:_Endianness = "little" ; double yv_a(n_a, nv_a) ; yv_a:units = "degrees" ; yv_a:_Storage = "contiguous" ; yv_a:_Endianness = "little" ; double xv_a(n_a, nv_a) ; xv_a:units = "degrees" ; xv_a:_Storage = "contiguous" ; xv_a:_Endianness = "little" ; double yv_b(n_b, nv_b) ; yv_b:units = "degrees" ; yv_b:_Storage = "contiguous" ; yv_b:_Endianness = "little" ; double xv_b(n_b, nv_b) ; xv_b:units = "degrees" ; xv_b:_Storage = "contiguous" ; xv_b:_Endianness = "little" ; int mask_a(n_a) ; mask_a:units = "unitless" ; mask_a:_Storage = "contiguous" ; mask_a:_Endianness = "little" ; int mask_b(n_b) ; mask_b:units = "unitless" ; mask_b:_Storage = "contiguous" ; mask_b:_Endianness = "little" ; double area_a(n_a) ; area_a:units = "square radians" ; area_a:_Storage = "contiguous" ; area_a:_Endianness = "little" ; double area_b(n_b) ; area_b:units = "square radians" ; area_b:_Storage = "contiguous" ; area_b:_Endianness = "little" ; double frac_a(n_a) ; frac_a:units = "unitless" ; frac_a:_Storage = "contiguous" ; frac_a:_Endianness = "little" ; double frac_b(n_b) ; frac_b:units = "unitless" ; frac_b:_Storage = "contiguous" ; frac_b:_Endianness = "little" ; int col(n_s) ; col:_Storage = "contiguous" ; col:_Endianness = "little" ; int row(n_s) ; row:_Storage = "contiguous" ; row:_Endianness = "little" ; double S(n_s) ; S:_Storage = "contiguous" ; S:_Endianness = "little" ;

// global attributes: :title = "ESMF Offline Regridding Weight Generator" ; :normalization = "destarea" ; :map_method = "Conservative remapping" ; :ESMF_regrid_method = "First-order Conservative" ; :conventions = "NCAR-CSM" ; :domain_a = "xe_input_grid.nc" ; :domain_b = "xe_output_grid.nc" ; :grid_file_src = "xe_input_grid.nc" ; :grid_file_dst = "xe_output_grid.nc" ; :ESMF_version = "v8.6.0" ; :_NCProperties = "version=2,netcdf=4.7.4,hdf5=1.12.0," ; :_SuperblockVersion = 0 ; :_IsNetcdf4 = 1 ; :_Format = "netCDF-4" ;

The traceback of the error in Pycharm when I’m using the weights in xesmf is pretty basic:

/home/matlab/miniconda3/envs/pythonProject/lib/python3.10/site-packages/xesmf/frontend.py:875: UserWarning: Cannot use parallel=True when reuse_weights=True or when weights is not None. Building Regridder normally. warnings.warn( starting regridder setup

Process finished with exit code 137 (interrupted by signal 9:SIGKILL)

When I try without parallel=True, all I get is “Process finished with exit code 137 (interrupted by signal 9:SIGKILL)”

aulemahal commented 3 months ago

parallel=True is indeed not needed here as the weights are already computed.

May be you can try extract only the variables needed by xESMF ? With ncks, part of nco, this would look like (where INPUT is the name of your large file):

ncks -C -v row,col,S INPUT weights.nc
jvmcgovern commented 3 months ago

I tried using the xesmf.smm backend option but it ran out of memory (needed 1.8PB of memory!). I'm going trying the command line tool now to go direct but will probably use the xesmf in future for comparison of model output with satellite data etc.