Thanks to @MichaelLaufer for bringing this to my attention.
Describe the bug
Error on MPI_Init() + Hang on MPI_Finalize(), when running a simple MPI app with rc_x/dc_x transports:
[mlaufer@storm test_MPI]$ mpirun -np 2 --hostfile hostfile_09 -x UCX_TLS=self,sysv,rc_x -x UCX_LOG_LEVEL=trace -x UCX_IB_GID_INDEX=0 --mca coll ^ucx --mca fcoll ^vulcan ./pical
[1599217002.536877] [thunder09:322013:0] sock.c:114 UCX DIAG failed to read from /sys/class/net/ens3f0np0/bonding/ad_num_ports: No such file or directory, assuming 802.3ad bonding is disabled
[1599217002.536879] [thunder09:322014:0] sock.c:114 UCX DIAG failed to read from /sys/class/net/ens3f0np0/bonding/ad_num_ports: No such file or directory, assuming 802.3ad bonding is disabled
[1599217002.538940] [thunder09:322014:0] sock.c:114 UCX DIAG failed to read from /sys/class/net/ens3f0np0/bonding/ad_num_ports: No such file or directory, assuming 802.3ad bonding is disabled
[1599217002.539004] [thunder09:322013:0] sock.c:114 UCX DIAG failed to read from /sys/class/net/ens3f0np0/bonding/ad_num_ports: No such file or directory, assuming 802.3ad bonding is disabled
[1599217002.546716] [thunder09:322014:0] sock.c:114 UCX DIAG failed to read from /sys/class/net/ens2f0np0/bonding/ad_num_ports: No such file or directory, assuming 802.3ad bonding is disabled
[1599217002.546724] [thunder09:322013:0] sock.c:114 UCX DIAG failed to read from /sys/class/net/ens2f0np0/bonding/ad_num_ports: No such file or directory, assuming 802.3ad bonding is disabled
[1599217002.548892] [thunder09:322014:0] sock.c:114 UCX DIAG failed to read from /sys/class/net/ens2f0np0/bonding/ad_num_ports: No such file or directory, assuming 802.3ad bonding is disabled
[1599217002.548961] [thunder09:322013:0] sock.c:114 UCX DIAG failed to read from /sys/class/net/ens2f0np0/bonding/ad_num_ports: No such file or directory, assuming 802.3ad bonding is disabled
[1599217002.556681] [thunder09:322014:0] ucp_worker.c:1648 UCX INFO ep_cfg[0]: tag(self/memory rc_mlx5/mlx5_0:1 rc_mlx5/mlx5_2:1);
[1599217002.556684] [thunder09:322013:0] ucp_worker.c:1648 UCX INFO ep_cfg[0]: tag(self/memory rc_mlx5/mlx5_0:1 rc_mlx5/mlx5_2:1);
[1599217002.573537] [thunder09:322013:0] ucp_worker.c:1648 UCX INFO ep_cfg[1]: tag(sysv_/memory rc_mlx5/mlx5_0:1 rc_mlx5/mlx5_2:1);
[1599217002.576122] [thunder09:322013:0] ib_mlx5_dv.c:224 UCX ERROR mlx5dv_devx_obj_modify(503) failed, syndrome 0: Invalid argument
[1599217002.577668] [thunder09:322014:0] ucp_worker.c:1648 UCX INFO ep_cfg[1]: tag(sysv_/memory rc_mlx5/mlx5_0:1 rc_mlx5/mlx5_2:1);
Process 0 of 2 on thunder09
[1599217002.581119] [thunder09:322014:0] ib_mlx5_dv.c:224 UCX ERROR mlx5dv_devx_obj_modify(503) failed, syndrome 0: Invalid argument
[1599217002.581207] [thunder09:322014:0] ib_mlx5_dv.c:224 UCX ERROR mlx5dv_devx_obj_modify(503) failed, syndrome 0: Invalid argument
Process 1 of 2 on thunder09
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.002396
<HANG>
The command-line WORKs if "-x UCX_IB_GID_INDEX=0" is not specified.
The command-line WORKS with rc_v instead of rc_x.
The command-line WORKs with older UCX (before devx support...).
This happens even on one host, where IB is actually not even used.
GDB on either of the hung processes:
(gdb) bt
#0 0x00001545187f81d0 in ucs_callbackq_dispatch (cbq=<optimized out>)
at /mnt/central/testing/src/ucx/src/ucs/datastruct/callbackq.h:210
#1 uct_worker_progress (worker=<optimized out>) at /mnt/central/testing/src/ucx/src/uct/api/uct.h:2414
#2 ucp_worker_progress (worker=worker@entry=0x20f6c50) at core/ucp_worker.c:2158
#3 0x0000154518c7dadb in opal_common_ucx_wait_request (type=OPAL_COMMON_UCX_REQUEST_TYPE_UCP,
msg=0x154518c80753 "ucp_disconnect_nb", worker=0x20f6c50, request=0x234c070) at common_ucx.h:216
#4 opal_common_ucx_wait_all_requests (reqs=reqs@entry=0x2274a70, count=<optimized out>,
worker=worker@entry=0x20f6c50, type=OPAL_COMMON_UCX_REQUEST_TYPE_UCP) at common_ucx.c:211
#5 0x0000154518c7e153 in opal_common_ucx_del_procs_nofence (procs=procs@entry=0x2274570, count=count@entry=2,
my_rank=<optimized out>, max_disconnect=1, worker=worker@entry=0x20f6c50) at common_ucx.c:255
#6 0x0000154518c7e199 in opal_common_ucx_del_procs (procs=procs@entry=0x2274570, count=count@entry=2,
my_rank=<optimized out>, max_disconnect=<optimized out>, worker=0x20f6c50) at common_ucx.c:275
#7 0x0000154518e867f9 in mca_pml_ucx_del_procs (procs=<optimized out>, nprocs=2) at pml_ucx.c:481
#8 0x000015451f7fd552 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:326
#9 0x0000000000400e0e in main ()
(gdb)
The app does nothing special:
/* example from MPICH */
#include "mpi.h"
#include <stdio.h>
#include <math.h>
double f(double);
double f(double a)
{
return (4.0 / (1.0 + a*a));
}
int main(int argc,char *argv[])
{
int done = 0, n, myid, numprocs, i;
double PI25DT = 3.141592653589793238462643;
double mypi, pi, h, sum, x;
double startwtime = 0.0, endwtime;
int namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Get_processor_name(processor_name,&namelen);
fprintf(stdout,"Process %d of %d on %s\n",
myid, numprocs, processor_name);
n = 0;
while (!done)
{
if (myid == 0)
{
/*
printf("Enter the number of intervals: (0 quits) ");
scanf("%d",&n);
*/
if (n==0) n=10000; else n=0;
startwtime = MPI_Wtime();
}
MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (n == 0)
done = 1;
else
{
h = 1.0 / (double) n;
sum = 0.0;
/* A slightly better approach starts from large i and works back */
for (i = myid + 1; i <= n; i += numprocs)
{
x = h * ((double)i - 0.5);
sum += f(x);
}
mypi = h * sum;
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0)
{
printf("pi is approximately %.16f, Error is %.16f\n",
pi, fabs(pi - PI25DT));
endwtime = MPI_Wtime();
printf("wall clock time = %f\n", endwtime-startwtime);
fflush( stdout );
}
}
}
MPI_Finalize();
return 0;
}
Setup and versions
CentOS Linux release 8.2.2004 (Core)
Linux kernel 5.8.2-1.el8.elrepo.x86_64
NIC used: Mellanox Connect-X6 VPI SD (MCX654106A-ECAT), configured for RoCE
Thanks to @MichaelLaufer for bringing this to my attention.
Describe the bug
Error on MPI_Init() + Hang on MPI_Finalize(), when running a simple MPI app with rc_x/dc_x transports:
The app does nothing special:
Setup and versions