Closed zhngaj closed 3 years ago
Add the stack trace of hanging in IMB-MPI1 PingPong for your reference.
(gdb) bt
#0 0x00007ff4cdec8a5d in recv () from /lib64/libpthread.so.0
#1 0x00007ff3f9fc8292 in tcpx_read_to_buffer ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#2 0x00007ff3f9fc7592 in tcpx_ep_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#3 0x00007ff3f9fc7309 in tcpx_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#4 0x00007ff3f9fd989d in ofi_cq_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#5 0x00007ff3f9fda54b in ofi_cq_readfrom ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#6 0x00007ff3f9b37f0e in rxm_ep_do_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#7 0x00007ff3f9b39e19 in rxm_ep_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#8 0x00007ff3f9b4fe6d in ofi_cq_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#9 0x00007ff3f9b50b1b in ofi_cq_readfrom ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#10 0x00007ff4cea67c7d in MPIDI_NM_progress_impl (vci=<optimized out>, blocking=<optimized out>)
at ../../src/mpid/ch4/netmod/include/../ofi/ofi_progress.h:39
#11 MPIDI_NM_progress (vci=428, blocking=36615584) at ../../src/mpid/ch4/netmod/ofi/util.c:26
#12 0x00007ff4ce95db8e in PMPI_Recv (buf=0x1ac, count=36615584, datatype=512, source=-840136099, tag=0, comm=0,
status=0x7ffd10c2ed80) at ../../src/mpid/ch4/netmod/include/../ofi/intel/ofi_recv.h:266
#13 0x000000000045159a in IMB_init_communicator ()
#14 0x000000000042fca5 in OriginalBenchmark<BenchmarkSuite<(benchmark_suite_t)0>, &IMB_pingpong>::run(scope_item const&) ()
#15 0x0000000000405473 in main ()
Is this the only thread? The socket is set to non-blocking - are you sure it hangs there and not in a different thread?
We have a more verbose stacktrace for the IMB-MPI1 PingPong hanging produced with intel-mpi debug mode.
(gdb) bt
#0 0x00007fd80b22ea5d in recv () from /lib64/libpthread.so.0
#1 0x00007fd739f22292 in tcpx_read_to_buffer ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#2 0x00007fd739f21592 in tcpx_ep_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#3 0x00007fd739f21309 in tcpx_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#4 0x00007fd739f3389d in ofi_cq_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#5 0x00007fd739f3454b in ofi_cq_readfrom ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#6 0x00007fd73a164f0e in rxm_ep_do_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#7 0x00007fd73a166e19 in rxm_ep_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#8 0x00007fd73a17ce6d in ofi_cq_progress ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#9 0x00007fd73a17db1b in ofi_cq_readfrom ()
from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#10 0x00007fd80c009b5b in fi_cq_read (cq=0xd48cb0, buf=0x7ffcdfa79cd8, count=1) at /usr/include/rdma/fi_eq.h:385
#11 0x00007fd80c015f85 in MPIDI_NM_progress_impl (vci=0, blocking=1)
at ../../src/mpid/ch4/netmod/include/../ofi/ofi_progress.h:39
#12 0x00007fd80c016334 in MPIDI_NM_progress (vci=0, blocking=1) at ../../src/mpid/ch4/netmod/ofi/util.c:26
#13 0x00007fd80bec11c3 in MPIDI_NM_mpi_recv (buf=0x7f20003a1308, count=286, datatype=1275069445, rank=7, tag=1000,
comm=0x7fd80d3938e0 <MPIR_Comm_builtin>, context_offset=0, addr=0x7f2000380278, status=0x7ffcdfa7a700,
request=0x7ffcdfa7a530) at ../../src/mpid/ch4/netmod/include/../ofi/intel/ofi_recv.h:266
#14 0x00007fd80bec36c8 in MPIDI_recv_unsafe (buf=0x7f20003a1308, count=286, datatype=1275069445, rank=7, tag=1000,
comm=0x7fd80d3938e0 <MPIR_Comm_builtin>, context_offset=0, av=0x7f2000380278, status=0x7ffcdfa7a700,
request=0x7ffcdfa7a530) at ../../src/mpid/ch4/src/ch4_recv.h:175
#15 0x00007fd80bec3ace in MPIDI_recv_safe (buf=0x7f20003a1308, count=286, datatype=1275069445, rank=7, tag=1000,
comm=0x7fd80d3938e0 <MPIR_Comm_builtin>, context_offset=0, av=0x7f2000380278, status=0x7ffcdfa7a700,
req=0x7ffcdfa7a530) at ../../src/mpid/ch4/src/ch4_recv.h:405
#16 0x00007fd80bec3d23 in MPID_Recv (buf=0x7f20003a1308, count=286, datatype=1275069445, rank=7, tag=1000,
comm=0x7fd80d3938e0 <MPIR_Comm_builtin>, context_offset=0, status=0x7ffcdfa7a700, request=0x7ffcdfa7a530)
at ../../src/mpid/ch4/src/ch4_recv.h:561
#17 0x00007fd80bec50ae in PMPI_Recv (buf=0x7f20003a1308, count=286, datatype=1275069445, source=7, tag=1000,
comm=1140850688, status=0x7ffcdfa7a700) at ../../src/mpi/pt2pt/recv.c:135
#18 0x000000000045159a in IMB_init_communicator ()
#19 0x000000000042fca5 in OriginalBenchmark<BenchmarkSuite<(benchmark_suite_t)0>, &IMB_pingpong>::run(scope_item const&) ()
#20 0x0000000000405473 in main ()
Yes, we did not see other threads. We also ran with other provider, and did not see such issue. So we think it's hanging.
Hello,
We gdb attach to the hanging process, and do thread apply all bt
.
We still can only see one thread.
It is Intel MPI specific build. Do you see the issue with public version of libfabric?
I switched to Intel MPI U7 + IMB 2019 U6 , and tried with both internal libfabric and public libfabric v1.10.1. The pingpong test is still hanging for me.
Internal libfabric 1.10.0a1-impi
[ec2-user@ip-172-31-4-63 fsx]$ mpirun -n 1152 -f /fsx/hosts -env I_MPI_DEBUG=1 -env I_MPI_OFI_LIBRARY_INTERNAL=1 -env I_MPI_OFI_PROVIDER=tcp /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): libfabric version: 1.10.0a1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
#------------------------------------------------------------
# Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part
#------------------------------------------------------------
# Date : Thu Jun 18 14:47:51 2020
# Machine : x86_64
# System : Linux
# Release : 4.14.177-139.254.amzn2.x86_64
# Version : #1 SMP Thu May 7 18:48:23 UTC 2020
# MPI Version : 3.1
# MPI Thread Environment:
# Calling sequence was:
# /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/IMB-MPI1 PingPong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
<... hanging>
public libfabric v1.10.1
[ec2-user@ip-172-31-4-63 fsx]$ mpirun -n 1152 -f /fsx/hosts -env I_MPI_DEBUG=1 -env I_MPI_OFI_LIBRARY_INTERNAL=0 -env I_MPI_OFI_PROVIDER=tcp /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): libfabric version: 1.10.1
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
#------------------------------------------------------------
# Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part
#------------------------------------------------------------
# Date : Thu Jun 18 14:41:30 2020
# Machine : x86_64
# System : Linux
# Release : 4.14.177-139.254.amzn2.x86_64
# Version : #1 SMP Thu May 7 18:48:23 UTC 2020
# MPI Version : 3.1
# MPI Thread Environment:
# Calling sequence was:
# /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/IMB-MPI1 PingPong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
<... hanging>
any fix for this?
In runs that reproduced the issue, the core issue was related to system config settings.
FI_PROVIDER=tcp mpirun -n 1792 -ppn 128 -f ~/hosts IMB-MPI1 pingpong
These were the system configuration values that I needed to adjust.
net.ipv4.tcp_syncookies = 0 net.core.somaxconn = 8192 net.core.netdev_max_backlog = 4096 net.ipv4.tcp_max_syn_backlog = 8192
/etc/security/limits.conf
To run at 2688 ranks (192 per node), I used a custom version of libfabric:
> diff --git a/prov/tcp/src/tcpx_ep.c b/prov/tcp/src/tcpx_ep.c index 4e8d8da..40ff0f9
> 100644
> --- a/prov/tcp/src/tcpx_ep.c
> +++ b/prov/tcp/src/tcpx_ep.c
> @@ -666,7 +666,7 @@ static int tcpx_pep_listen(struct fid_pep *pep)
>
> tcpx_pep = container_of(pep,struct tcpx_pep, util_pep.pep_fid);
>
> - if (listen(tcpx_pep->sock, SOMAXCONN)) {
> + if (listen(tcpx_pep->sock, 4096)) {
> FI_WARN(&tcpx_prov, FI_LOG_EP_CTRL,
> "socket listen failed\n");
> return -ofi_sockerr();
The patch above has been merged into master.
There were 2 different deadlock issues found in the tcp provider, which are now fixed in the upstream master. Without those fixes, it's possible for tcp to hang, particularly when a connection is being shutdown by a peer. You would need a custom build of libfabric (for now) to see if it addresses the issue here.
Although the deadlock is possible and reproducible on other apps, I'm not aware of it occurring with MPI. It is possible however.
There has been no activity on this issue for more than 360 days. Marking it stale.
Hi,
We saw a few hanging issues when running IMB with Intel MPI U6 with its internal libfabric tcp provider. (1152 procs = 32 nodes * 36 proc/node)