Closed ashwinvis closed 1 year ago
It hadn't even occurred to me that I didn't have any MPI tutorials. Thanks for point that out! I'll fix that this week :-)
Took a little longer than planned, sorry, but an MPI tutorial is now included. It follows almost exactly the 'Low Resolution' spherical case for consistency, but with multiple depth levels to allow multiple MPI ranks. When used on the recommended number of processors, it runs in a couple minutes.
40 processors is a bit much, so I have to get a supercomputer to test it. Would it be possible to reduce the requirements such that even with 8 processors you get something in a few minutes?
I've updated the ABOUT_TUTORIAL.md
to include a note / instructions on how to adjust the tutorial (changing a single number in the generate_data_sphere.py
scripts) to allow the tutorial to run on fewer processors in reasonable time.
The default setting is 24 processors running for ~10 minutes.
Does that seem reasonable?
Reducing the MPI-requirement
24 processors is a fairly heavy requirement if you are not running on a computing cluster. You can simply run on fewer processors (highest efficiency if the number of processors divides evenly into 48 - the number of vertical levels), but at the cost of increasing the runtime.
To reduce the processor cost without increasing runtime, you can decrease the number of vertical levels proportionately. E.g. you can reduce the vertical levels to 12 in order to run on 6 processors in a similar amount of time.
To adjust the number of vertical levels, you can adjust line 13 of
generate_data_sphere.py
, which readsNlon, Nlat, Ndepth = int(360//2), int(180//2), 48
. The last number,48
, specifies the number of vertical levels.When running the code, you can use any number of MPI ranks up to the number of verticals levels, but the most efficient use of processors occurs when the number of MPI ranks divides evenly into the number of vertical levels.
For something that runs in ~5 minutes on 8 processors, setting the number of vertical levels to 8 (last number on line 13 of generate_data_sphere.py
, change from 48 to 8) should do the trick.
Managed to run with 8 vertical levels and 4 processors in a laptop for ~30 minutes.
While in the article you state that
I did not find an example which demonstrates this in the Tutorials. We only see OpenMP being used and
SLURM_NTASKS
is always set as 1. Would it be possible to construct a simple example which shows MPI parallelism. This is necessary to check out from https://github.com/openjournals/joss-reviews/issues/4277#issuecomment-1383020819: