Open kevinsala opened 1 month ago
@kevinsala can you try forcing synchronization between sender and receiver by using MPI_Issend (instead if MPI_Isend) for the last message in every window (msg == NMessages-1)
Using synchronous sends for the last message in every iteration avoids the out of memory error. The memory usage keeps around 180 MB per process.
Is there any throttling mechanism inside UCX to avoid this issue (without a workaround in the application side)?
We observe this problem in a task-based MPI+OpenMP application that does not use MPI_Waitall
but MPI_Testsome
. Once a request completes, the next message from the next iteration (same msg
tag) can be issued, allowing messages from different iterations (but with distinct tags) to be in flight simultaneously. As we do not use MPI_Waitall
in each iteration, I believe that using MPI_Issend
for a message would not be sufficient.
Currently there is no throttling mechanism in UCX for unexpected tags, though it's a good feature to add. Is it possible to add a blocking MPI_Ssend once a while to create such synchronization?
The MPI program below is getting an out of memory because UCX tries to allocate too many descriptors from the
rc_recv_desc
memory pool. The program performs thousands of iterations, where each iteration exchanges data from the first process to the last one: process 0 sends to process 1, process 1 sends to process 2, and so on. The first process only sends data, the last only receives, and the rest send and receive. In each iteration, data is exchanged in multiple messages (4096 messages and 4096 bytes per message).The out of memory error is always in the process 1. It seems that the process 0 is executing iterations significantly ahead of the other ones (I guess because
MPI_Waitall
in process 0 does not synchronize with the receives of process 1). For instance, when the application crashes, the process 0 just executed the iteration 7152, while the rest just processed the iteration 3743:I'm attaching a PDF with the heap profile of process 1 using gperftools: memory.pdf. The profile shows the memory consumed at the last moments before the out of memory error (consuming around 110 GB). Most memory is allocated by the
ucp_worker_progress
call insideMPI_Waitall
.At the last moments of the execution, the debug information of UCX (
UCX_LOG_LEVEL=debug
) printed by process 1 is the following:Environment
The executions use MPICH 4.2.1 over UCX 1.16.0, but I've observed the same error with previous UCX releases, and also OpenMPI 4.1.6 over UCX.
Configuration of UCX 1.16.0:
Configuration of MPIC 4.2.1:
Machine:
Commands to reproduce
I can reproduce this error running on four processes across four nodes: