pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
182 stars 32 forks source link

Regridding bathymetry from high resolution to output grid #349

Open jvmcgovern opened 3 months ago

jvmcgovern commented 3 months ago

I'm trying to use xesmf (latest version on conda-forge) to conservatively remap bathymetry for a ROMS/CROCO model. I created lat_b and lon_b fields for the input (regular) and output (rotated) grids. Because of the size of the input data (~16000x12000) and output data (~1300x1050), I thought parallel=true would work. I'm getting the following error:

runtimeerror: an attempt has been made to start a new process before the current process has finished its bootstrapping phase. this probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:

I've tried this with and without chunking of the input data. I'm using chunking of the output field of approx 550x700 but to no avail. Any pointers would be appreciated. Do I have to go straight to ESMF?

raphaeldussin commented 3 months ago

you probably need to use the ESMF parallel remapping tool, see this for how to do it on HPC: https://xesmf.readthedocs.io/en/stable/large_problems_on_HPC.html

if that does not work, take a look at https://github.com/ESMG/gridtools I also have a package that's still very much under development here: https://github.com/raphaeldussin/sloppy

jvmcgovern commented 3 months ago

Thanks @raphaeldussin. I'm working now on installing ESMF (I cant find the command line tools from the esmpy install).

For future reference, is there a convenient way to interrogate at system level in a python environment what kind of upper limits there are on computation.

aulemahal commented 3 months ago

On one hand, I think the error is unrelated to xESMF, but rather to dask.

May be this issue can help. Activating parallel=True in xESMF forces dask to use processes and this might be the reason the error popped at that moment.

HOWEVER, as noted in the docstring and in the notebook, parallel=True will perform the weights generation for multiple blocks of the output grid in parallel. The input grid is loaded completely in memory for each block of the output grid. In your case, the output grid is much smaller than the input grid, which means that you won't be making any RAM gain with the option, it might be even worse.

I would try without it.

EDIT: I'll add a note that if you have an environment with xESMF, Then ESMF_RegridWeightGen is already installed inside it!

raphaeldussin commented 3 months ago

@jvmcgovern the ESMF command line tools should already be in the conda env where you installed xesmf, it comes as part of the ESMF conda package

jvmcgovern commented 3 months ago

I used ESMF on my HPC server (with 396 cores) and I got an apparent out of memory error. There's limited information on what went wrong however:

Sun Mar 24 00:48:52 GMT 2024 submit MPI job Starting weight generation with these inputs: Source File: xe_input_grid.nc Destination File: xe_output_grid.nc Weight File: xe_input2output_grid_weights.nc Source File is in CF Grid format Source Grid is a global grid Source Grid is a logically rectangular grid Use the center coordinates of the source grid to do the regrid Destination File is in CF Grid format Destination Grid is a global grid Destination Grid is a logically rectangular grid Use the center coordinates of the destination grid to do the regrid Regrid Method: conserve Pole option: NONE Line Type: greatcircle Norm Type: dstarea Extrap. Method: none

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 200 PID 38612 RUNNING AT n105 = KILLED BY SIGNAL: 9 (Killed)