Closed sjpb closed 3 years ago
Ah I found the npmin
option - but map
seems a better option here. Any suggestions on how to use it for the above case?
Hi @sjpb
You can find more information about map
option using:
IMB-MPI1 -help map
In your case i assume you need to use next option:
IMB-MPI uniband -map COUNT_RANKS_PER_NODExCOUNT_NODES
Ah I found the
npmin
option - butmap
seems a better option here. Any suggestions on how to use it for the above case?
Just leaving a note on -map usage. To understand better the idea behind -map and the way it can be used to measure the inter-node communications only, one may check out Examples in section "-map PxQ Option" from here: https://software.intel.com/content/www/us/en/develop/documentation/imb-user-guide/top/benchmark-methodology/command-line-control.html
Also -npmin is typically used all the time for every IMB run to eliminate these np=2,4,... executions.
I must be missing something here:
I'm running e.g.
uniband
via slurm + openmpi. I have 2x 32-core nodes, so I want to run 64 processes with the 1st half of them on Node#1 and the 2nd half on Node#2 so the pair-wise transfers go across the networkSetting sbatch options of
--ntasks=64
and--ntasks-per-node=32
, running:does the right thing for the 64-process case, with ranks 0-31 on node#1 and 32-63 on node#2. However, uniband also generates results for 2, 4, 8, 16 and 32-processes. Which seems helpful, except that all the communication there is within node1, which isn't really measuring what I want.
Is this the intended usage and behavior? If so, is there a way of disabling the runs on less than all processes, so I can control placement properly?