Closed ashjbarnes closed 9 months ago
This is happening because the ESMF being used from xesmf is configured with MPI support. When the regridder is called, MPI is initialised within the context of the Python process. OpenMPI doesn't support recursively running MPI, so it aborts immediately (related: https://github.com/open-mpi/ompi/issues/9729).
I think the RegridWeightGen
step needs to be performed either before, or external to the Python script which performs the regridding (see https://xesmf.readthedocs.io/en/latest/large_problems_on_HPC.html for suggestions).
Thanks @angus-g !
Yeah that's a good fix for now. I wonder if there's a way to get your kernal to purge mpi after a regridder call? Calling orte-clean doesn't seem to do anything and you can't kill orted without killing the whole kernel
Or, is it possible to run xesmf without mpirun at all for the smaller tasks?
Hi,
Unless there's a proposal for a specific change to xESMF, I'd close this. Thoughts ?
I think it's probably fine to close this. At most a note about why this occurs could go somewhere, but maybe people will stumble on this thread anyway!
I'm writing a pipeline that needs to do some 'small' regridding tasks, and one 'big' one. I call
xesmf.Regridder()
for the small tasks, then usesubprocess('mpirun ESMF_RegridWeightGen...')
for for the big oneHowever, when running the xesmf regridder, subsequent
subprocess
calls fail returning:CompletedProcess(args='mpirun ESMF_RegridWeightGen -s bathy_original.nc -d topog_raw.nc -w weights[/bathyweights.nc](https://file+.vscode-resource.vscode-cdn.net/bathyweights.nc) -m bilinear --src_regional --dst_regional', returncode=1)
To reproduce, run this code, then uncomment the xesfm regridding line (mpirun should work) and run again.