FAQ on using single socket

cponder commented 4 years ago

Over-and-over again, i run into this issue when doing performance testing. I would appreciate it if you would add the answer to your FAQ:

  How do I parameterize "mpirun" so only one socket is used on each node?

I don't want to use a HOSTFILE, I want to do it from the command-line so i don't have to hard-code the process topology. I want to use parameterized arguments so the "mpirun" line will scale.

cponder commented 4 years ago

Here's an example of a parameterized invocation line

  mpirun -n $((NODES*4)) -bind-to socket --cpu-list 0,6,12,18 --report-bindings

which, of course, doesn't work correctly:

[dgx2-03:44870] MCW rank 0 is not bound (or bound to all available processors)
[dgx2-03:44870] MCW rank 1 is not bound (or bound to all available processors)
[dgx2-03:44870] MCW rank 2 is not bound (or bound to all available processors)
[dgx2-03:44870] MCW rank 3 is not bound (or bound to all available processors)

In this case it doesn't seem to be binding processes to cores, it looks like it's assigning 4 cores per proc and then oversubscribing 4 procs to each core. I'd already raised this issue to the user-group

https://www.mail-archive.com/search?l=users@lists.open-mpi.org&q=subject:%22%5C%5BOMPI+users%5C%5D+bind%5C-to%5C-socket+across+sockets+with+different+core%09counts%22&o=newest

but the above answer is the only one I'd received.

ggouaillardet commented 4 years ago

Assuming there are more than 2 MPI tasks and you have enough nodes, what about

mpirun --rank-by socket ...

cponder commented 4 years ago

I can't tell if that should solve the problem or not, OpenMPI 4.0.2 fails-out with the error below. The error doesn't happen if I omit the "--rank-by socket" flag. The logic, though, is it that allocation would be round-robin to fill socket 0 on each node before using socket 1 on each node? That would give me what I want, if it did work.

mpirun -n 2 --rank-by socket --report-bindings ./MGTranspose 2048 2048 100 [dgx2-02:164357] PMIX ERROR: NOT-FOUND in file dstore_base.c at line 2817 [dgx2-02:164357] PMIX ERROR: NOT-FOUND in file server/pmix_server.c at line 3334 [dgx2-02:164400] PMIX ERROR: OUT-OF-RESOURCE in file client/pmix_client.c at line 227 [dgx2-02:164400] OPAL ERROR: Error in file pmix3x_client.c at line 112 An error occurred in MPI_Init on a NULL communicator MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job) [dgx2-02:164400] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

ggouaillardet commented 4 years ago

Can you try --rank-by node instead?

cponder commented 4 years ago

In this case here with one node

mpirun -n 4 --rank-by node --report-bindings ./MGTranspose 2048 2048 100

it's still splitting across sockets, here's the mapping for rank 3, for example:

cponder commented 4 years ago

If it were up to me, I ought to be able to specify --npersocket 4, and then the first 4 procs would go to socket 0 and then spill over to socket 1 and so on. So for multi-node I would use

-n $((NODES*4)) --npersocket 4

and it would run 4 procs on socket 0 of each node. There are other ways to "complete the definition" of these parameter combinations, but other possible resolutions, like evenly allocating the procs across the sockets, already have unambiguous ways to specify them.

rhc54 commented 4 years ago

I'm not entirely sure I understand what you are trying to do, but if you want say four procs on each node, all bound to the first socket on that node, then I would use:

$ mpirun --map-by core --hostfile foo ...

where the hostfile indicated there are 4 slots on each node. Note that this would bind each proc to only one core, so only the first four cores would be used. If you wanted the procs to simply be bound to the socket, then add --bind-to socket to the above command line.

If you want each of the procs to use 2 cores on the socket, then you would say:

$ mpirun --map-by core:pe=2 --hostfile foo ...

Note this will automatically cause each proc to be bound to two cores, not the entire socket, but each proc will have its own two cores.

rhc54 commented 4 years ago

Another option for you: if you want to have the job mapped such that all procs land on the first socket of every node until those sockets are completely filled, and then start filling the second socket on every node, then the easiest method is just:

$ mpirun --map-by socket:span ...

Note that the procs will be evenly spread across the nodes - it won't fill the first socket of the first node and then move to the first socket of the second node. You can control how the ranks are assigned to the procs via the --rank-by option - the span qualifier works similarly on it.

cponder commented 4 years ago

I'm still seeing the problem, here's 2 procs on 1 node:

+ mpirun -n 2 --map-by socket:span --report-bindings ./MGTranspose 256 32768 100
[dgx2-03:126956] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]], socket 0[core 12[hwt 0]], socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]], socket 0[core 16[hwt 0]], socket 0[core 17[hwt 0]], socket 0[core 18[hwt 0]], socket 0[core 19[hwt 0]]: [B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././././././././././.]
[dgx2-03:126956] MCW rank 1 bound to socket 1[core 20[hwt 0]], socket 1[core 21[hwt 0]], socket 1[core 22[hwt 0]], socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]], socket 1[core 26[hwt 0]], socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]], socket 1[core 30[hwt 0]], socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]], socket 1[core 34[hwt 0]], socket 1[core 35[hwt 0]], socket 1[core 36[hwt 0]], socket 1[core 37[hwt 0]], socket 1[core 38[hwt 0]], socket 1[core 39[hwt 0]]: [./././././././././././././././././././.][B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B]

So again it looks like one proc is going onto each socket.

cponder commented 4 years ago

I believe with MVAPICH2 that either of these forms would solve the problem

 mpirun_rsh -np $((PROCS*NODES)) MV2_CPU_BINDING_POLICY=bunch ...
 mpirun_rsh -np $((4*NODES)) MV2_CPU_MAPPING=0,1,2,3 ...

The second form uses a fixed number of procs per socket but that would be ok, it's the ability to vary the number of nodes that matters to me.

alex--m commented 4 years ago

@ggouaillardet @rhc54 I probably didn't have the same intention as @cponder , but I stumbled across what seems to be the same issue: mpirun --rank-by core XYZ just fails (OMPI v4.0.3, internal pmix3). I have a workaround, but I was hoping I could help find the root cause.

This may help - notice how some ranks are "INVALID" when I pass the flag, and the printf() I added in dstore_base.c (inside pmix) tells me that the missing information on rank #2 is responsible for the crash:

[hpc@thunder8 amargolin]$ ./ompi/build/bin/mpirun -np 4 --display-map --rank-by core /mnt/central/users/amargolin/osu/build/libexec/osu-micro-benchmarks/mpi/collective/osu_allreduce
 Data for JOB [55081,1] offset 0 Total slots allocated 36

 ========================   JOB MAP   ========================

 Data for node: thunder8        Num slots: 36   Max slots: 0    Num procs: 4
        Process OMPI jobid: [55081,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]], socket 0[core 12[hwt 0-1]], socket 0[core 13[hwt 0-1]], socket 0[core 14[hwt 0-1]], socket 0[core 15[hwt 0-1]], socket 0[core 16[hwt 0-1]], socket 0[core 17[hwt 0-1]]:[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../../../../../../../..]
        Process OMPI jobid: [55081,1] App: 0 Process rank: INVALID Bound: socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]], socket 1[core 20[hwt 0-1]], socket 1[core 21[hwt 0-1]], socket 1[core 22[hwt 0-1]], socket 1[core 23[hwt 0-1]], socket 1[core 24[hwt 0-1]], socket 1[core 25[hwt 0-1]], socket 1[core 26[hwt 0-1]], socket 1[core 27[hwt 0-1]], socket 1[core 28[hwt 0-1]], socket 1[core 29[hwt 0-1]], socket 1[core 30[hwt 0-1]], socket 1[core 31[hwt 0-1]], socket 1[core 32[hwt 0-1]], socket 1[core 33[hwt 0-1]], socket 1[core 34[hwt 0-1]], socket 1[core 35[hwt 0-1]]:[../../../../../../../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
        Process OMPI jobid: [55081,1] App: 0 Process rank: 1 Bound: socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]], socket 0[core 12[hwt 0-1]], socket 0[core 13[hwt 0-1]], socket 0[core 14[hwt 0-1]], socket 0[core 15[hwt 0-1]], socket 0[core 16[hwt 0-1]], socket 0[core 17[hwt 0-1]]:[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../../../../../../../..]
        Process OMPI jobid: [55081,1] App: 0 Process rank: INVALID Bound: socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]], socket 1[core 20[hwt 0-1]], socket 1[core 21[hwt 0-1]], socket 1[core 22[hwt 0-1]], socket 1[core 23[hwt 0-1]], socket 1[core 24[hwt 0-1]], socket 1[core 25[hwt 0-1]], socket 1[core 26[hwt 0-1]], socket 1[core 27[hwt 0-1]], socket 1[core 28[hwt 0-1]], socket 1[core 29[hwt 0-1]], socket 1[core 30[hwt 0-1]], socket 1[core 31[hwt 0-1]], socket 1[core 32[hwt 0-1]], socket 1[core 33[hwt 0-1]], socket 1[core 34[hwt 0-1]], socket 1[core 35[hwt 0-1]]:[../../../../../../../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]

 =============================================================
dstore_base.c:2866 - rank=2 (my printf)
[thunder8.thunder:62184] PMIX ERROR: NOT-FOUND in file dstore_base.c at line 2867
[thunder8.thunder:62184] PMIX ERROR: NOT-FOUND in file server/pmix_server.c at line 3408
[thunder8.thunder:62188] PMIX ERROR: OUT-OF-RESOURCE in file client/pmix_client.c at line 231
[thunder8.thunder:62188] OPAL ERROR: Error in file pmix3x_client.c at line 112
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[thunder8.thunder:62188] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[55081,1],0]
  Exit code:    1
--------------------------------------------------------------------------
^C^C[hpc@thunder8 amargolin]$ ^C
[hpc@thunder8 amargolin]$

[hpc@thunder8 amargolin]$ ./ompi/build/bin/mpirun -np 4 --display-map /mnt/central/users/amargolin/osu/build/libexec/osu-micro-benchmarks/mpi/collective/osu_allreduce
 Data for JOB [55091,1] offset 0 Total slots allocated 36

 ========================   JOB MAP   ========================

 Data for node: thunder8        Num slots: 36   Max slots: 0    Num procs: 4
        Process OMPI jobid: [55091,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]], socket 0[core 12[hwt 0-1]], socket 0[core 13[hwt 0-1]], socket 0[core 14[hwt 0-1]], socket 0[core 15[hwt 0-1]], socket 0[core 16[hwt 0-1]], socket 0[core 17[hwt 0-1]]:[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../../../../../../../..]
        Process OMPI jobid: [55091,1] App: 0 Process rank: 1 Bound: socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]], socket 1[core 20[hwt 0-1]], socket 1[core 21[hwt 0-1]], socket 1[core 22[hwt 0-1]], socket 1[core 23[hwt 0-1]], socket 1[core 24[hwt 0-1]], socket 1[core 25[hwt 0-1]], socket 1[core 26[hwt 0-1]], socket 1[core 27[hwt 0-1]], socket 1[core 28[hwt 0-1]], socket 1[core 29[hwt 0-1]], socket 1[core 30[hwt 0-1]], socket 1[core 31[hwt 0-1]], socket 1[core 32[hwt 0-1]], socket 1[core 33[hwt 0-1]], socket 1[core 34[hwt 0-1]], socket 1[core 35[hwt 0-1]]:[../../../../../../../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
        Process OMPI jobid: [55091,1] App: 0 Process rank: 2 Bound: socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]], socket 0[core 12[hwt 0-1]], socket 0[core 13[hwt 0-1]], socket 0[core 14[hwt 0-1]], socket 0[core 15[hwt 0-1]], socket 0[core 16[hwt 0-1]], socket 0[core 17[hwt 0-1]]:[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../../../../../../../..]
        Process OMPI jobid: [55091,1] App: 0 Process rank: 3 Bound: socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]], socket 1[core 20[hwt 0-1]], socket 1[core 21[hwt 0-1]], socket 1[core 22[hwt 0-1]], socket 1[core 23[hwt 0-1]], socket 1[core 24[hwt 0-1]], socket 1[core 25[hwt 0-1]], socket 1[core 26[hwt 0-1]], socket 1[core 27[hwt 0-1]], socket 1[core 28[hwt 0-1]], socket 1[core 29[hwt 0-1]], socket 1[core 30[hwt 0-1]], socket 1[core 31[hwt 0-1]], socket 1[core 32[hwt 0-1]], socket 1[core 33[hwt 0-1]], socket 1[core 34[hwt 0-1]], socket 1[core 35[hwt 0-1]]:[../../../../../../../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]

 =============================================================

# OSU MPI Allreduce Latency Test v5.6.2
# Size       Avg Latency(us)
4                       2.05
8                       2.17
16                      2.41
32                      2.44
64                      2.71
128                     2.94
256                     3.33
512                     4.55
1024                    5.43
2048                    6.39
4096                   14.82
8192                   19.21
16384                  45.87
32768                  56.93
65536                  77.65
131072                127.67
262144                221.33
524288                435.12
1048576               994.88
[thunder8.thunder:62194] 7 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[thunder8.thunder:62194] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[hpc@thunder8 amargolin]$

gpaulsen commented 4 years ago

This has been merged into v4.0.x and will be released with v4.0.4. In Master however, the codepaths will use PRRTE, and so should be re-verified then.

cponder commented 4 years ago

What are you closing here? There are 2 issues now, (1) my wanting a way to run one-socket-per-node without using a hostfile, and (2) the error-message that Alex is reporting.

cponder commented 4 years ago

Using an existing hostfile means that I can't vary the number of nodes, right? Or, at the very least, I can't port my run-script to another cluster without having to re-write the hostfile.

gpaulsen commented 4 years ago

Okay, sorry, It was unclear to what extent the PR addressed the issues. Reopening.

rhc54 commented 4 years ago

We solved the problem from @alex--m.

I confess I'm still having trouble really understanding the other problem here. IIUC, what @cponder wants is to have 4 procs running on socket0 of each node. Yet I am not gathering why the following reportedly doesn't work:

$ mpirun -n $((NODES*4)) -bind-to socket --cpu-list 0,6,12,18 --report-bindings

Yes, it will output that the procs "are not bound", but that is because the procs are being confined to the specific cores listed here, and those cores are all on the same socket. Hence, the procs are "bound to all available processors", which is what the full message says.

Are you sure you aren't getting what you want? Have you printed out the actual bitmask to see where the procs are? In reality, once you specified the cpu-list (and those cpus cover the first socket on each node), you don't gain anything by the --bind-to option.

I'm working to improve the binding message to make it clearer what has happened - perhaps that is the only true issue here.

cponder commented 4 years ago

Can you give me a more specific command I can try running?

rhc54 commented 4 years ago

Errr...well, why don't you use the above command and run a program like this:


#define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    cpu_set_t mask;
    long nproc, i;

    if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_getaffinity");
        assert(false);
    }
    nproc = sysconf(_SC_NPROCESSORS_ONLN);
    printf("sched_getaffinity = ");
    for (i = 0; i < nproc; i++) {
        printf("%d ", CPU_ISSET(i, &mask));
    }
    printf("\n");
}

cponder commented 4 years ago

I get these messages

[prm-dgx-02:77987] MCW rank 0 is not bound (or bound to all available processors)
[prm-dgx-02:77987] MCW rank 1 is not bound (or bound to all available processors)
[prm-dgx-02:77987] MCW rank 2 is not bound (or bound to all available processors) 
[prm-dgx-02:77987] MCW rank 3 is not bound (or bound to all available processors)
[prm-dgx-31:58416] MCW rank 4 is not bound (or bound to all available processors)
[prm-dgx-31:58416] MCW rank 5 is not bound (or bound to all available processors)
[prm-dgx-31:58416] MCW rank 6 is not bound (or bound to all available processors)
[prm-dgx-31:58416] MCW rank 7 is not bound (or bound to all available processors)
[prm-dgx-32:10828] MCW rank 8 is not bound (or bound to all available processors)
[prm-dgx-32:10828] MCW rank 9 is not bound (or bound to all available processors)
[prm-dgx-32:10828] MCW rank 10 is not bound (or bound to all available processors)
[prm-dgx-32:10828] MCW rank 11 is not bound (or bound to all available processors)
[prm-dgx-36:70363] MCW rank 12 is not bound (or bound to all available processors)
[prm-dgx-36:70363] MCW rank 13 is not bound (or bound to all available processors)
[prm-dgx-36:70363] MCW rank 14 is not bound (or bound to all available processors)
[prm-dgx-36:70363] MCW rank 15 is not bound (or bound to all available processors)

and 16 copies of this line

sched_getaffinity = 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

It looks to me like it is showing cores 0,6,12,18 on both sockets. This is with OpenMPI 4.0.3.

lcebaman commented 3 years ago

As I understand this, @cponder is looking for a way of fully exploiting half-populated node (i.e. single socket out of 2 sockets) without using a hostfile. I have not figured out the way either. Say I have a node with 2 sockets, 24 cores each. If I want to run on a fully populated single socket:

mpirun -np 24 --map-by ppr:24:node:PE=2 --bind-to socket --report-bindings ...

But this does not work as I was expecting

--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a conflicting binding
policy was specified:

  #cpus-per-proc:  2
  type of cpus:    cores as cpus
  binding policy given: SOCKET

The correct binding policy for the given type of cpu is:

  correct binding policy:  bind-to core

This is the binding policy we would apply by default for this
situation, so no binding need be specified. Please correct the
situation and try again.

strocode commented 2 years ago

I have a similar problem - I'm trying to bind all processes to socket #1 but I can't work out how.

ivankochin commented 1 year ago

We solved the problem from @alex--m.

I confess I'm still having trouble really understanding the other problem here. IIUC, what @cponder wants is to have 4 procs running on socket0 of each node. Yet I am not gathering why the following reportedly doesn't work:
$ mpirun -n $((NODES*4)) -bind-to socket --cpu-list 0,6,12,18 --report-bindings
Yes, it will output that the procs "are not bound", but that is because the procs are being confined to the specific cores listed here, and those cores are all on the same socket. Hence, the procs are "bound to all available processors", which is what the full message says.

Are you sure you aren't getting what you want? Have you printed out the actual bitmask to see where the procs are? In reality, once you specified the cpu-list (and those cpus cover the first socket on each node), you don't gain anything by the --bind-to option.

I'm working to improve the binding message to make it clearer what has happened - perhaps that is the only true issue here.

Am I understand correctly that --cpu-list 0,6,12,18 mean the first four cores on the first socket?

If so, then printing what processors are available may be useful. Because message like "not bound (or bound to all available processors)" can be little bit misleading. WDYT?

rhc54 commented 1 year ago

I doubt that anything will be done for prior releases, but I'll take a crack in OMPI v5 at providing (a) a clearer statement as to the "not bound" vs "bound to all available", and (b) adding a "show-cpus" option that will tell you what cpus are available by socket.

rhc54 commented 1 year ago

I added this display output for you (see https://github.com/openpmix/prrte/pull/1634). It doesn't output much just yet, but I may have someone willing to extend/beautify the output in the near future:

$  prterun --prtemca hwloc_use_topo_file /Users/rhc/pmix/topologies/summit.h17n08.lstopo-2.2.0.xml --prtemca ras_simulator_num_nodes 3 --map-by package:pe=5:corecpu -n 2 --display cpus=nodeA1,map hostname

======================   AVAILABLE PROCESSORS [node: nodeA1]   ======================

PKG[0]: 0-20
PKG[1]: 21-41

======================================================================

========================   JOB MAP   ========================
Data for JOB prterun-Ralphs-iMac-2-61474@1 offset 0 Total slots allocated 126
    Mapping policy: BYPACKAGE:NOOVERSUBSCRIBE  Ranking policy: FILL Binding policy: CORE:IF-SUPPORTED
    Cpu set: N/A  PPR: N/A  Cpus-per-rank: 5  Cpu Type: CORE

Data for node: nodeA0   Num slots: 42   Max slots: 42   Num procs: 2
        Process jobid: prterun-Ralphs-iMac-2-61474@1 App: 0 Process rank: 0 Bound: package[0][core:0-4]
        Process jobid: prterun-Ralphs-iMac-2-61474@1 App: 0 Process rank: 1 Bound: package[1][core:21-25]
$

open-mpi / ompi

FAQ on using single socket #7311